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The  Ninth  Image  Understanding  Workshop  was  conducted  under  the  auspices  of  Major  Larry  E. 
Druffel,  United  States  Air  Forces,  program  manager  for  the  I.U.  Research  Project  In  the  Information 
Processing  Techniques  Office,  Defense  Advanced  Research  Projects  Agency.  As  outlined  at  the  Eighth 
I.U.  Workshop,  held  at  Pittsburgh,  Pennsylvania,  some  six  months  ago.  Major  Druffel  has  diligently 
worked  toward  a narrowing  of  focus  In  the  program  as  the  time  approaches  for  the  planned  concept 
demonstration  which  It  Is  hoped  will  be  the  capstone  to  this  five  year  research  effort.  As  previously 
noted,  the  intended  focus  Is  suggested  as  "the  development  of  the  tools  needed  for  Inclusion  In  some 
future  system",  and  at  this  workshop  the  formulation  of  a scenario  which  Is  to  provide  a context  for  the 
concept  demonstration  was  discussed  In  an  effort  to  move  toward  an  agreed  upon  program.  It  was 
also  noted  that  although  the  plan  Is  to  move  toward  a concept  demonstration  In  the  next  two  or  three 
years,  the  I.U.  Program  has  not  lost  sight  of  the  need  for  fundamental  research  .vhich  is  required  to 
support  future  capabilities. 

Major  Druffel  has  stated  his  appreciation  for  the  effort  put  forth  by  all  of  the  government 
research  and  user  personnel  in  attending  these  periodic  workshops  and  helping  to  provide  guidance 
to  the  ARPA  research  community.  Furthermore,  many  of  the  attendees  have  Indicated  their  recognition 
of  the  value  gained  by  the  entire  research  conmunity  through  the  Interaction  and  cross  fertilization 
provided  by  the  various  researchers  and  the  diversified  user  community  both  in  the  Image  Understanding 
and  in  other  related  research  programs. 

These  proceedings  contain  the  program  reviews  presented  by  the  Principal  Investigators  and 
Technical  Reports  prepared  by  selected  research  personnel  at  the  Ninth  Image  Understanding  Workshop 
held  at  Palo  Alto,  California  on  24-25  April  1979.  In  attendance  at  the  workshop.  In  addition  to  the 
University  and  Industrial  Research  Personnel,  were  representatives  from  many  Army,  Navy,  Air  Force 
and  Government  Agency  Organizations  Interested  in  the  accomplishments  of  this  research  program.  As 
usual,  the  workshop  provided  for  a lively  exchange  of  views  between  the  potential  user  conmunity  and 
those  organizations  active  In  the  I.U.  Research  Program.  In  addition,  a panel  discussion  was  conducted 
between  several  government  research  organizations  concerning  a plan  for  utilization  of  emerging 
technology  In  Image  Understanding  by  the  Defense  Mapping  Agency.  Also,  participants  were  afforded 
the  opportunity  to  visit  the  Artificial  Intelligence  Center  at  SRI  International,  the  Artificial 
Intelligence  Laboratory  at  Stanford  University,  and  the  Palo  Alto  Research  Laboratory  of  the  Lockheed 
Missiles  and  Space  Company. 

The  workshop  was  hosted  by  Dr.  Martin  A.  Fischler,  senior  computer  scientist  at  the  Artificial 
Intelligence  Center  of  SRI  International.  The  workshop  organizer  wishes  to  thank  Dr.  Fischler  for 
his  efforts  at  making  the  workshop  a success  and  also  to  recognize  the  efforts  of  Miss  Jean  Burnet, 
Manager  of  Conference  Services  at  SRI  International,  for  her  excellent  advice  and  assistance.  Apprecia- 
tion Is  also  due  Miss  Carrie  Howell  of  Science  Applications,  Incorporated  for  providing  typing  support 
for  mailings  and  the  collection  and  arrangement  of  the  conference  proceedings  as  well  as  on-site 
assistance  during  the  conference.  Typing  assistance  was  also  provided  by  Mist  Jackquellne  Frye  of  the 
SAI  staff. 

The  cover  design  was  created  by  Mr.  Marco  Fllllplnl  of  the  Art  Department  of  Science  Applica- 
tions, Inc.  from  material  supplied  by  Dr.  Martin  Fischler  of  SRI  International.  The  map  data  and  aerial 
photography  shown  are  representative  of  that  used  by  the  SRI  International  research  group  In  Image 
Understanding  on  several  of  their  on-going  projects.  Dr.  Fischler  states  that  caption  should  read 
"using  Map  Knowledge  to  Interpret  Aerial  Imagery”,  an  Important  step  In  the  SRI  International  Image 
recognition  process.  An  In-depth  description  of  the  SRI  International  Program  Is  contained  In  this 
volume  as  well  as  In  the  previous  I.U.  Workshop  Proceedings. 


Lee  S.  Baumann 

Science  Applications,  Inc. 

Workshop  Organizer 
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Research  at  SRI  International  under  the  ARPA 
Image  Understanding  Program  was  Initiated  to 
Investigate  ways  In  which  diverse  sources  of 
knowledge  night  be  brought  to  bear  on  the  problem 
of  analyzing  and  interpreting  aerial  inagea.  The 
Initial  phase  of  research  was  exploratory  and 
Identified  various  means  for  exploiting  knowledge 
in  processing  aerial  photographs  for  such  military 
applications  as  cartography,  intelligence,  weapon 
guidance,  and  targeting.  A key  concept  is  the  use 
of  a generalized  digital  map  to  guide  the  process 
of  image  analysis. 

The  results  of  this  earlier  work  were 
integrated  in  an  interactive  computer  system  called 
"Hawkeye"  [1].  This  system  provides  necessary 
basic  facilities  for  a wide  range  of  tasks  and  a 
framework  within  which  specialist  programs  can  be 
integrated. 

Research  is  now  focused  on  the  development  of 
a program  capable  of  expert  performance  in  a 
specific  task  domain:  road  monitoring.  The 
following  sections  of  this  paper  present  an 
overview  as  well  as  some  recent  technical  results 
produced  in  this  ongoing  effort. 


OBJECTIVE 

The  primary  objective  of  this  research  is  to 
build  a computer  system  that  "understands"  the 
nature  of  roads  and  road  events.  It  should  be 
capable  of  performing  such  tasks  as: 

( 1)  Finding  roads  in  aerial  imagery 

(2)  Distinguishing  vehicles  on  roads  from 
shadows,  signposts,  road  markings,  etc. 

(3)  Comparing  multiple  images  and  symbolic 
information  pertaining  to  the  same  road 
segment,  and  deciding  whether  significant 
changes  have  occurred. 

The  system  should  be  capable  of  performing  the 
above  tasks  even  when  the  roads  are  partially 
occluded  by  clouds  or  terrain  features,  or  are 
viewed  from  arbitrary  angles  and  distances,  or  pass 
through  a variety  of  terrains. 


APPROACH 

To  achieve  the  above  capabilities,  we  are 
developing  two  "expert"  subsystems:  the  "Road 
Expert"  and  the  "Vehicle  Expert."  The  Road  Expert 
knows  mainly  about  roads,  how  to  find  them  in 
imagery,  and  what  things  belong  on  them.  It  works 
at  low-to-lntermediate  resolution  (e.g. , from  1 to 
20  feet  of  ground  distance  per  image  pixel)  and  has 
the  ability  to  distinguish  vehicles  from  other  road 
detail.  The  Vehicle  Expert  works  on  higher- 
resolution  imagery  and  can  identify  vehicles  as  to 
type.  He  are  concentrating  our  efforts  on  the  Road 
Expert  and  therefore  will  limit  most  of  our 
discussion  here  to  this  component  of  our  system. 

The  major  tasks  automatically  performed  by  the 
Road  Expert  are: 

( 1 ) Image/map  correspondence — Place  a newly 
acquired  image  into  geographic 
correspondence  with  the  map  data  base. 

(2)  Road  tracking — Precisely  mark  the  center 
line  of  selected  visible  sections  of  road 
in  the  image. 

(3)  Anomaly  analysis — Locate  and  analyze 
anomalous  objects  on,  and  adjacent  to, 
the  road  surface;  identify  potential 
vehicles. 

The  image/map  correspondence  task  is 
accomplished  by  locating  roads  and  road  features  as 
landmarks;  correspondence  is  performed  at 
resolutions  as  coarse  as  20  feet/pixel  so  that  a 
reasonably  wide  field  of  view  (10  to  100  square 
miles)  can  be  processed  at  one  time.  It  is 
nominally  assumed  that  the  initial  combinations  of 
uncertainties  about  the  estimates  for  the  camera 
parameters  imply  uncertainties  on  the  ground  of 
approximately  +/-  200  feet  in  X and  Y.  The 
correspondence  procedure  works  iteratively  to 
refine  the  camera  parameters.  A typical  goal  is  to 
reduce  the  implied  uncertainties  on  the  ground  to 
about  +/-  2 feet  in  X and  Y. 

After  the  image  is  placed  into  correspondence 
with  our  map  data  base,  one  or  more  of  the  visible 
road  sections  are  selected  for  monitoring.  The 
road  center  line  and  lane  boundaries  are  found  to 
an  accuracy  of  one  to  two  pixels  in  Imagery  with  a 
resolution  of  1 to  3 feet/pixel. 

Given  the  precise  road  locations  in  the  image, 
anomalous  objects  are  detected  by  scanning  on  and 
along  the  road  pavement.  These  anomalous  objects 
are  then  identified  as  to  type  (e.g.,  vehicle, 
shadow,  road  surface  marking,  signpost,  etc.). 
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DtlUrl)  shaped  anomaly  sen  be  f>w«l  in  the  seen 
general  leoe'lan  over  aoaa  • ■'•tided  period  of  Urn*. 
Additional  eseaples  of  hoe  K>e-Ma«  Knowledge  end 
stored  models  sen  eld  In  the  analysis  process 
Include:  using  the  ties  of  day  In  dl serial  net lng 
shadows  f roe  objects  of  Interest;  utilizing  the 
general  shape  end  width  of  the  road  (obtained  froa 
a map)  as  sn  aid  In  road  tracking;  and  providing 
relevant  information  on  tne  anticipated  size, 
shape,  and  road  orientation  of  potential  vehicles. 

A central  theae  of  this  effort  Is  to  oonslder 
roads  as  a knowledge  doaaln.  In  particular,  we  are 
addressing  the  question  of  now  a priori  knowledge 
can  be  directly  Invoked  by  the  Image-analysis 
modules  (wnat  type  of  knowledge,  how  should  It  be 
represented,  and  wna*.  are  the  aechanlsms  for  Its 
use).  To  achieve  our  goal  of  building  a very-high- 
performance  system,  we  are  developing  explicit 
models  of  the  image  structures  we  are  dealing  with 
and,  additionally,  models  of  the  decision 
procedures  embedded  In  the  Image-processing 
algorithms  so  that  the  algorithms  can  evaluate 
their  own  performance.  Finally,  we  are  planning  an 
overall  control  structure  that  will  be  concerned 
with  the  problems  of  coordinating  analysis  across  a 
spectrum  of  levels  of  resolution  and  with 
integrating  multi-source  information. 


PROGRESS 

Our  work  to  date  has  provided  the  capabilities 
necessary  to  assemble  an  integrated  Road  Expert 
demonstration  system,  and  we  are  currently  planning 
to  have  such  a system  operational  by  October  1979. 
This  system  will  allow  a user  to  submit  new 
photographs  from  a previously  "Instantiated"  site 
for  automatic  analysis  in  which  image  scanning, 
image-to-data  base  correspondence,  road  marking, 
and  anomaly  analysis  will  be  performed  "on  line". 

The  demonstration  system  will  also  permit  both 
interactive  instantiation  of  a new  site  and 
selected  analysis  functions  (such  as  road  tracking) 
on  photographs  for  which  there  Is  no  data  base 
support. 

Me  have  previously  described  [2,  3]  our 
approach  to  both  the  correspondence  and  road 
marking  tasks;  work  continues  in  these  two  areas, 
both  to  achieve  higher  performance  and  to 
generalize  the  techniques  to  a wider  class  of 
domains.  A more  detailed  description  of  this 
ongoing  work  will  be  deferred  until  a later  time. 

In  the  following  two  subsections  we  will 
describe  recent  progress  in  dealing  with  the 
problem  of  vehicle  detection  and  anomaly  analysis, 
and  we  will  discuss  our  plans  for  on-line  site 
Instantiation. 
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The  correlation  road  tracker  has  been  slightly 
modified  to  produce.  In  addition  to  the  road  track, 
an  Image  array  containing  • he  difference  between 
the  actual  brightness  In  the  original  Image  and  the 
brightness  predicted  from  the  road  model 
(originally  this  additional  output  was  in  the  form 
of  a binary  anomaly  mask).  The  value  of  this 
"difference  Image"  Is  twofold:  It  can  be 
thresholded  to  decide  what  is  and  is  not  anomalous, 
and  the  Image  with  the  road  profile  subtracted  out 
Is  useful  for  analyzing  shadows  and  road 
discolorations. 

It  turns  out  that  an  understanding  of  shadows 
is  crucial  in  making  sense  out  of  road  scenes. 
Aerial  scenes  are  often  photographed  in  direct 
sunlight,  and  vehicles  on  the  road  cause  anomalies 
that  include  the  vehicle  plus  its  shadow.  Large 
objects  off  the  road,  such  as  signs,  trees,  and 
utility  poles  cast  shadows  that  are  noticed  by  the 
anomaly  detector.  In  addition,  the  shadows  can 
give  valuable  clues  to  the  size  and  shape  of  the 
objects  casting  them. 

We  employ  three  basic  techniques  to  identify 
shadows.  A brightness  model  allows  us  to  identify 
shadows  by  the  absolute  brightness  of  pixels  in  the 
difference  imago.  A predictive  model  allows  us  to 
Identify  the  portion  of  an  anomaly  most  likely  to 
be  shadow  when  we  know  the  position  of  the  sun  and 
the  height  of  the  object  casting  the  shadow. 
Finally,  a projective  model,  which  tries  to  locate 
the  two  long  parallel  sides  of  a vehicle,  can 
locate  the  dividing  line  between  a vehicle  and  its 
shadow. 

A number  of  "expert  subroutines"  examine  eaon 
anomaly.  The  vehicle  expert  subroutine  exploits 
the  basically  rectangular  shape  of  vehicles  when 
viewed  from  above.  Anomalies  that  are  very  much 
the  wrong  size  are  eliminated  at  the  outset. 
Projecting  the  average  brightness  and  average 
gradient  magnitude  upon  a baseline  perpendicular  to 
the  presumed  direction  of  vehicle  travel  enables 
finding  the  shadow  and  establishing  a nominal  width 
for  the  vehicle.  Height  can  usually  be  estimated 
from  the  shadow,  and  length  is  inferred  from  the 
size  of  the  totsl  anomaly  (allowing  for  a shadow 
fore  or  aft). 

Two  other  anomaly  experts,  the  tree-shadow 
expert  and  the  road  marking  expert,  provide 
alternate  explanations  for  anomalies  not  identified 
as  vehicles.  To  qualify  as  a tree  shadow  (or  the 
shadow  of  some  other  object  off  the  road)  an 
anomaly  must  have  the  appropriate  average 
brightness,  a low  variance  in  brightness,  and  touch 
the  side  of  the  road  at  the  side  nearer  the  sun. 
Road  markings  (usually  painted  arrows  or  speed 
limit  numerals)  are  usually  brighter  than  the  road 
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The  purpose  -if  • ha  road  data  base  la  to  anebl* 
•-ha  Hoad  taper?  to  accura'ely  and  reliably  rind 
known  roads  In  new  images,  trace  their  paths,  and 
locate  anomalies  • hat  sight  be  potential  vehicles 
on  the  roads.  The  data  base  also  contains 
Information  to  help  distinguish  vehicles  fro* 
pensanent  road  features  such  as  signs  and  their 
shadows,  and  painted  narklngs  on  the  road  surface. 

The  current  road  data  base  contains  both 
geometric  and  photometric  information.  The 
geometric  part  of  the  road  data  base  was  generated 
by  a variety  of  means,  depending  on  the  level  of 
detail  and  accuracy  desired.  The  coarsest  level  of 
data  representation  was  generated  by  specifying 
approximate  world  location,  direction,  and  width  of 
road  segments,  either  by  typing  In  numerical 
Information  or  by  tracing  the  road  In  a low 
resolution  (USGS  7.5  minute  series)  map  of  the 
area.  The  most  accurate  geometric  information  was 
entered  Into  the  data  base  both  by  typing  In 
precise  numerical  data  and  by  manually  tracing 
portions  of  "as  built"  survey  plans  of  the  road 
obtained  from  the  California  Department  of 
Transportation. 

Photometrlo  Information  associated  with  a road 
segment  Is  Inserted  Into  the  data  base  by  using  the 
correlation  road  tracker;  as  images  of  a geographic 
site  are  Interpreted  by  the  road  tracker,  road 
photometry  models  are  automatically  entered. 
Spatially  fixed  landmarks,  such  as  painted  road 
surface  markings,  are  (at  present)  manually 
specified;  and  a corresponding  rectangular  Image 
patch  Is  entered  Into  the  data  base. 

The  data  base  Is  currently  implemented  using 
SAIL  record  structures  which  conveniently  provide 
graph  structures,  lists,  numeric  arrays,  etc.  A 
genera 1-pu-poae  record  structure  I/O  package 
communicates  these  structures  between  SAIL  programs 
and  disk  files,  be  recognize  the  need,  In  the 
future,  to  develop  a file  representation  that  can 
be  communicated  to  LISP  programs. 

be  Intend  to  include  examples  of  data  base 
construction  as  a part  of  the  Road  Expert 
demonstration  and  are  working  toward  a scenario  of 
the  following  type.  An  Image  of  a site  will  be 
soanned  and  digitized  at  approximately  one  to  three 
feet  per  pixel  resolution;  and  a photo  Interpreter 
will  then  Indicate  the  approximate  locations  of 
primary  road  segments  In  the  Image,  using  a track 
ball.  The  automatic  road  tracker  program  will  be 
Invoked  to  accurately  trace  the  roads,  generate 
cross-section  photometry  models,  and  detect 
anomalies  that  might  be  permanent  surface  markings. 
The  anomaly  analysis  techniques  described  In  the 
preceding  subsection  will  specify  which  anomalies 
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CONCLUDING  CCfWfcNTS 

be  see  the  military  relevance  of  our  work 
extending  well  beyond  th*  specific  road  monitoring 
scenario  presented  above.  In  particular,  a Road 
Expert  can  be  applied  to  such  problems  as: 

(!)  Intelligence — Monitoring  roads  for 
movement  of  military  forces 

(2)  Weapon  guidance — Use  of  roads  as 
landmarks  for  "map-matching"  systems 

(3)  Targeting — Detection  of  vehicles  for 
interdiction  of  road  traffic 

(4)  Cartography — Compilation  and  updating  of 
maps  with  respect  to  roads  and  other 
linear  features  (especially  those 
concerned  with  transportation),  such  as 
airport  runways,  railroads,  rivers,  etc. 

In  accord  with  our  generalized  view  of  the 
applicability  of  the  Road  Expert  and  the  knowledge- 
based,  Image-analysis  techniques  we  are 
constructing,  we  are  attempting  to  achieve  a level 
of  performance  and  understanding  in  each  of  the 
functional  tasks  that  far  exceeds  that  which  would 
be  required  for  dealing  with  the  road-monitoring 
scenario  alone. 
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Abstract 

We  describe  a program  of  research  which  seeks 
mechanisms  to  solve  fundamental  problems  In  image 
understanding.  The  research  is  applied  to  a set  of  problems 
from  cartography,  photointerpretation  and  guidance.  We  aim  to 
Integrate  solutions  across  this  range  of  problems  by  defining  a 
hierarchy  of  representations  within  a single  system, 
ACRONYM.  Past  work  has  laid  the  foundation  of 
representation  by  generalised  cones,  and  use  of  this 
representation  in  Identification  of  complex  objects.  We  have 
built  a system  which  demonstrates  automated  stereo  registration 
and  separation  of  objects  from  the  ground  surface  Stereo  and 
motion  parallax  perception  algorithms  based  on  area  correlation 
and  edge  matching  have  been  developed.  ACRONYM,  the 
model-based  vision  system,  has  been  designed  and  largely 
Implemented.  Recent  work  with  ACRONYM  has  concentrated 
on  expanding  Its  geometric  reasoning  capability  while 
concentrating  on  Identifying  aircraft  at  an  airfield.  Brooks  has 
demonstrated  ACRONYM’S  rule-based  mechanism  for 
segmenting  images  Into  well-formed  regions  Work  In  stereo  has 
been  applied  to  navigation  using  passive  sensing. 


Introduction 


This  report  covers  the  Image  Understanding  program  at 
Stanford  University,  known  as  Spatial  Understanding,  and  the 
Image  Understanding  program  with  Lockheed  as  principal 
contractor,  concerning  navigation  using  passively  sensed  images. 
We  have  chosen  typical  tasks  from  cartography, 
photointerpretation,  and  guidance  as  focus  for  our  research. 

1.  A photointerpreter  monitors  an  airfield.  The  system  identifies 
and  counts  aircraft  at  frequent  Intervals  to  monitor  air  traffic 

2.  An  interpreter  monitors  a building  complex  for  changes.  The 
system  uses  stereo,  a model  of  the  complex,  and  Identification  to 
distinguish  Insignificant  from  significant  changes.  The  system 
might  not  notify  the  Interpreter  about  changes  from  snow  or 
rain,  or  moving  a vehicle,  but  notify  him  about  building 
additions. 

1 An  interpreter  monitors  vehicle*  In  ttiglng  areas  The  lyttem 
Identifies  vehicles  and  monitors  traffic  to  and  from  the  area. 

4.  A low-flying  craft  navigates  with  a flight  path  which  may  be 
changed  at  will,  with  low  memory  requirements  from  images  or 
symbolic  reference  maps  using  passive  sensing  by  sequences  of 
fixes  using  stereo  and  motion  parallax  vision. 


Reports  In  these  proceedings  describe  progress  with  task  I 
[Binford  and  Brooks]  and  task  4 [Firschein  et  all  Previous 
reports  show  promising  results  with  task  3 [Binford  19771  We 
think  these  results  sufficient  to  predict  success  soon  in 
demonstrations  of  these  tasks.  Further,  we  believe  that  the 
ACRONYM  system  which  supports  the  first  three  tasks  will 
generalize  to  many  other  problems.  To  carry  these 
demonstrations  from  a first  success  to  reliable  accomplishment 
requires  better  Image  feature  description  and  Improved  ability 
to  use  shape  in  interpretation. 

Our  research  has  focused  on  ways  to  use  shape  In 
Interpretation  and  ways  to  determine  and  describe  surface  shape 
from  stereo  or  motion  parallax.  Traditionally,  Image 
Understanding  tasks  have  been  framed  as  image  matching  tasks. 
We  take  a very  different  approach.  Image  Understanding  tasks 
are  spatial  matching  tasks.  There  begins  to  be  an  understanding 
that  classic  image  matching  encounters  difficulties  for  some 
important  problems.  The  Image  matching  paradigm  is:  predict 
an  image  then  match  against  the  image.  That  procedure 
encounters  severe  problems:  consider  visible  Images  In  which 
surface  markings  or  camouflage,  movable  objects,  snow,  rain, 
clouds,  seasonal  changes  of  vegetation  all  change  images.  While 
It  is  extremely  difficult  to  predict  Images  accurately,  It  is  easy  to 
predict  parts  of  images  which  give  spatial  Information.  These 
predictions  can  be  symbolic.  The  spatial  matching  paradigm  it: 
predict  spatial  information  In  the  Image;  Infer  spatial  structure 
from  the  Image;  match  reference  structure  with  the  perceived 
structure.  The  ability  to  infer  spatial  structure  it  a key  research 
topic,  yet  one  in  which  there  is  beginning  to  be  progress.  It  Is  a 
medium  step  from  Image  matching  to  spatial  matching,  but  a 
step  with  great  payoff. 
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ACRONYM 

ACRONYM  is  * model-based  image  understanding 
system.  It  demonstrates  mechanisms  for  Interpretation  of  Images 
of  generic  object  classes  In  generic  viewing  conditions. 
ACRONYM  Is  designed  to  be  generaliiable.  It  incorporates  a 
powerful  geometric  modeling  capability  with  a high  level 
modeling  language  for  natural  communication  with  the  user  In 
terms  of  object  models.  A user  gives  high  level  descriptions  of 
both  generic  and  specific  Instances  of  objects.  A rule-based 
Inference  system  produces  a symbolic  summary  of  the  predicted 
appearance  of  the  objects.  This  geometric  reasoning  capability 
enables  the  system  to  Incorporate  and  relate  knowledge  and 
information  at  different  levels.  The  summary  of  predicted 
appearance  drives  a powerful  syntactic  matcher  to  find  instances 
of  the  objects  among  features  obtained  by  Image  segmentation 
procedures. 

ACRONYM  is  Intended  to  be  generaliiable  In  the 
following  sense:  Systems  for  different  tasks  should  be 
constructed  from  a large  core  of  common  modules  and  a small 
set  of  task -specific  modules.  It  is  being  tested  on  aircraft  Initially, 
later  on  vehicles,  then  buildings. 

The  system  has  not  yet  found  an  instance  of  an  object  In 
a digitized  image  given  only  a high  level  description  of  the 
object  class  and  low  level  descriptors  of  the  image.  However, 
that  achievement  appears  near.  All  the  necessary  mechanisms 
for  such  a test  are  opera'‘onal  in  at  least  prototype  form,  and 
have  been  tested  individually  or  as  subsystems  smaller  than  the 
total  ACRONYM  system.  As  more  rules  are  written  for  the 
Predictor  and  Planner  we  expect  to  soon  be  able  to  run  a 
complete  test. 

The  High-level  Modeler  has  been  used  to  construct  a 
large  number  of  models.  Object  models  are  graphs  whose 
primitive  parts  are  generaliied  cones.  These  models  are  very 
compact.  They  have  a natural  set  of  levels  of  detail  which  Is 
utilized  for  efficiency  In  Identification.  There  is  a high  level 
language  for  modeling.  A geometric  editor  beginning  with 
interactive  facilities  like  CEOMED.  a library  of  primitives,  and 
extending  to  rule-based  reasoning  about  spatial  relations.  We 
are  Involved  with  mathematical  analysis  of  generaliied  cone 
representation  and  analytic  hidden  surface  algorithms. 

Given  models  of  objects.  ACRONYM  attempts  to  find  instances 
of  the  objects  In  images.  The  Observability  Graph  will  tell  the 
matcher  how  to  find  instances.  It  is  a symbolic  summary  of  the 
expected  appearance  of  objects  In  the  image.  It  contains  generic 
and  specific  predictions  about  shape  elements  and  relations 
between  them,  with  information  about  how  to  find  them,  what 
conclusions  to  draw  If  Identified  and  what  to  conclude  If  they 
are  not  there.  If  Information  about  viewing  angles,  distance  and 
conditions  Is  available.  It  can  be  used  to  produce  more  definite 
predictions.  The  Predictor  and  Planner  module  Is  a rule-based 
system  which  uses  the  Object  Graph  to  produce  the 
Observability  Graph. 

The  program  must  choose  features  of  the  object  which 
correspond  to  Image  and  surface  properties  which  segmentation 
programs  can  find.  We  call  such  features  "observables*.  The 
features  which  we  are  using  first  Include  shape  and  two 
dimensional  spatial  relations  Of  shapes  within  the  image.  When 
dealing  with  a stereo  pair  of  pictures  we  also  use  three 
dimensional  spatial  relations  and  surface  Information. 


Since  the  exact  disposition  of  objects  in  the  image  Is  not  known 
In  advance,  the  Observability  Graph  can  not  contain  an  exact 
prediction  of  what  will  be  seen.  Rather,  It  must  consist  of 
predictions  which  adequately  describe  a range  of  possible 
appearances,  generic  with  respect  to  both  object  class  and 
viewpoint.  They  can  best  be  thought  of  as  supplying  constraints 
on  the  way  the  picture  can  be  expected  to  look.  For  Instance, 
given  that  the  system  will  be  examining  aerial  photographs  of 
airplanes  on  the  ground,  the  predictor  and  planner  can  tell  the 
matcher  that  when  a candidate  fuselage  is  found,  wings  should 
be  found  adjacent  to  It,  with  bilateral  symmetry.  There  will  be 
two  possibilities  for  the  relative  angle,  and  once  the  actual  angle 
has  been  found,  the  front  and  rear  of  the  fuselage  will  have 
been  distinguished.  Then  the  finer  task  of  locating  the  rear 
stabilizers  (for  positive  Identification  of  airplane  type)  can  be 
carried  out,  confined  to  a smalt  part  of  the  image. 

The  best  tort  of  observables  to  put  into  the  Observability 
Graph  are  those  which  are  Invariant.  This  Is  a very  strong 
requirement,  and  will  often  be  hard  to  meet.  Often  however, 
there  are  predictions  which  can  be  made  which  will  be  true 
under  a wide  range  of  viewpoints.  The  prediction  of  where  to 
look  for  the  wings  is  an  example  - sometimes  a wing  may  be 
obscured  by  a shadow  cast  by  a nearby  building,  but  most 
airplanes  will  be  out  In  the  open,  with  both  wings  visible.  Thus 
some  observables  are  almost  Invariant  over  a wide  range  of 
viewing  conditions.  We  call  these  quasl-Invariant  observables. 
Some  are  quasl-Invariant  with  respect  to  object  class,  while 
others  are  quasl-Invariant  with  respect  to  viewing  conditions. 
For  example  all  airplanes  have  a long  cylindrical  generaliied 
cone  as  the  fuselage  (Invariant  with  respect  to  object  class),  and 
from  most  viewpoints  (especially  aerial  views  of  airplanes  on  the 
ground)  the  fuselage  will  appear  as  an  elongated  ribbon 
(invariant  with  respect  to  viewing  conditions).  Functional 
observables  can  be  viewed  as  quasi-invariants  too,  at  they  will 
work  well,  almost  independently  of  the  function  parameters.  For 
example,  the  orientations  of  the  wings  within  an  Image  are 
functions  of  the  orientation  of  the  fuselage.  Conditional 
observables  are  a special  cate. 

ACRONYM  is  the  first  vision  system  to  Incorporate  a general 
reasoning  system.  It  is  necessary  because  we  wish  to  predict  the 
appearance  of  generic  objects,  from  generic  viewpoints.  The 
Predictor  and  Planner  has  been  run  on  a specific  model  of  an 
aircraft  and  a generic  model  of  an  airport.  It  produced  the 
complete  node  structure  of  the  Observability  Graphs  In  both 
cases.  The  detailed  rule  base  must  be  completed  to  carry  out  the 
aircraft  identification. 

The  Matcher  has  been  tested  on  a hand  coded 
Observability  Graph,  matching  against  a hand  coded  Picture 
Graph. 
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Well-Formed  Ribbons 

Brooks  describes  «n  algorithm  for  linking  straight  edge 
elements  produced  from  a bottom  up  line  finding  stage  [Brooks}. 
Edges  are  linked  by  a best  first  search  algorithm.  Heuristics  are 
used  both  to  select  candidate  edges  for  linking  and  to  prune  the 
search  tree.  A list  of  chains  of  edges  is  the  result  of  this  stage  of 
processing.  The  choice  of  heuristics  determines  the  chains  of 
edges  produced.  Further  heuristics  prune  the  list  of  chains. 
Finally  regions,  described  as  ribbons,  are  chosen  so  that  their 
boundaries  are  approximated  by  the  chains  of  edges.  Individual 
picture  elements  may  appear  in  multiple  ribbons  Further 
heuristics  can  be  employed  to  reduce  this  multiplicity.  It  is 
organized  so  that  a higher  level  computational  procedure  can 
easily  direct  this  low  level  algorithm  by  supplying  and  altering 
the  heuristics  during  processing.  A selection  of  useful  heuristics 
Is  described  and  then  shown  working  on  an  example  picture, 
with  a hand  simulated  control  program,  resulting  in  effective 
descriptions  of  the  Image. 


Fig.  1.  An  LIOII. 


Work  is  underway  to  transport  the  line  finding  programs 
of  Nevada  and  Babu  (1978}  to  our  laboratory.  That  may  be 
used  to  extend  goal-direction  down  to  the  line  finding  level. 
Such  a capability  should  prove  useful  when  the  Predictor  and 
Planner  has  an  understanding  of  the  process  of  shadow 
formation. 


Navigation  using  Passive  Sensing 

The  objective  of  the  study  is  to  show  passive  sensing 
techniques,  particularly  those  based  on  stereo,  which  provide  a 
combination  of  fixes  on  landmarks,  continuous  (racking  of 
linear  features  where  applicable,  and  dead  reckoning.  The 
ability  to  Improve  dead  reckoning  by  estimating  (rue  velocity 
and  wind  drift  Is  one  benefit.  A major  benefit  is  the  flexible 
flight  planning  possible.  The  required  image  bate  it  small, 
particularly  with  linear  features.  Thus,  a grid  of  check  points 
can  be  accomodated. 


The  Stanford  stereo  system  Is  being  tested  on  images  from 
the  Night  Vision  Lab  terrain  model.  The  stereo  system  obtained 
automatic  registration  and  a camera  model  for  the  two  images. 
A terrain  map  will  be  obtained  and  compared  with  the  NVL 
terrain  map.  Lockheed  is  evaluating  the  stereo  system  for  two 
roles:  terrain  mapping  of  relative  elevations  and  terrain 
matching;  measuring  Vekxltyfheight  by  using  (he  camera  model 
solver  to  compute  change  in  attitude  on  a non-stabilized 
platform.  Lockheed  has  evaluated  effects  of  structural  flexure  on 
absolute  altitude  accuracy  for  configurations  of  multiple 
cameras.  Acceptable  accuracy  does  seem  obtainable.  Of  course, 
relative  accuracy  it  easier  to  maintain. 

Automatic  Registration  of  Stereo  Images 

Research  has  demonstrated  high  potential  for  utility  of 
passive  imaging  techniques  for  high  resolution  depth 
measurement.  Passive  techniques  have  important  advantages 
over  active  ranging  techniques  In  hostile  environments. 
Sequences  of  images  from  a moving  aircraft  have  been  used  to 
find  the  ground  plane  and  separate  objects  from  ground.  The 
system  should  be  effective  with  camouflaged  surfaces.  The 
accuracy  attainable  has  been  demonstrated  to  be  2*  height  error 
for  3’  horizontal  pixel  size  on  the  ground,  with  a 60  degree 
baseline.  On  a general  purpose  computer,  the  process  requires 
about  15  seconds  with  no  guidance  information.  That  can  likely 
be  reduced  at  least  a factor  of  2.  With  accurate  guidance 
information,  the  time  required  is  estimated  to  be  about  250  msec 
(most  missions  would  probably  be  in  this  category).  The  system 
is  self-calibrating  and  highly  reliable. 

The  system  includes  a solution  to  the  problem  of 
determining  the  stereo  camera  model  from  information  within 
the  pair  of  pictures.  Imagine  an  aircraft  approaching  a runway. 
As  it  moves,  objects  on  both  sides  appear  to  move  radially 
outward  from  a center,  the  fixed  point.  The  fixed  point  is  the 
instantaneous  direction  of  motion.  The  pilot  knows  that  the 
point  which  does  not  appear  to  move  is  where  he  will  touch 
down,  unless  he  changes  direction.  The  distance  between  views 
and  the  apparent  displacement  of  points  allow  calculation  of  the 
distance  of  each  point  from  the  observer  and  from  the  vehicle 
path.  The  touchdown  point  can  be  calculated  from  the 
trajectory  of  centers.  That  Is  precisely  what  is  done  by  the 
camera  transform  solver  in  the  system  [Gennery}  il  determines 
the  transform  from  one  view  to  another  in  a sequence  of  view 
from  a moving  observer. 

The  program  first  orients  Itself  In  the  scene  and  finds  a 
model  for  the  transform  between  the  two  cameras.  This  step 
takes  60*  of  the  time  required  for  finding  the  ground  plane.  If 
two  views  are  an  accurately  calibrated  stereo  pair,  this  operation 
is  not  necessary.  If  accurate  guidance  information  is  available, 
this  operation  can  be  speeded  up  enormously.  The  program 
finds  a camera  transform  model  by  finding  a sample  of  features 
of  interest  in  one  image  and  matching  them  with  their 
corresponding  view  in  the  other  image.  The  Interest  Operator 
requires  about  75  msec  for  a 256x256  frame.  Interesting  features 
are  areas  (typically  8x8)  which  can  be  localized  in  two 
dimensions  without  a camera  transform.  The  operator  chooses 
those  areas  with  large  variance  along  all  grid  directions.  That  Is 
roughly  equivalent  to  a large  drop  In  autocorrelation  along  all 
grid  directions,  which  means  that  the  area  can  be  localised 
closely.  The  correlator  matches  the  features  in  the  other  image 
by  a coarse  to  fine  strategy:  It  has  versions  of  the  picture  at 
resolutions  of  256x256,128x128,64x64.  32x32,  and  16x16.  It  first 
matches  by  a small  search  on  a very  coarse  version  (16x16)  of 
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the  image.  It  then  performs  a search  in  the  next  finer  version  of 
the  image,  in  the  neighborhood  of  the  best  match  in  the 
previous  image.  That  step  is  repeated  until  the  full  resolution  Is 
reached.  The  matching  process  requires  only  50msec  for  a match 
of  a single  feature  no  matter  where  it  is  in  the  image.  If  the 
camera  transform  is  known,  search  it  necessary  only  along  a line 
In  the  image.  In  this  case,  search  is  about  a factor  of  seven 
faster.  If  the  depth  of  neighboring  points  is  used  as  a starting 
point  for  the  search,  the  match  is  another  factor  of  seven  faster. 
It  is  planned  to  incorporate  those  speedupt;  neither  it  now  used. 
The  matching  has  about  lOt  errors.  It  encounters  fewer 
ambiguities  than  brute  force  matching,  since  not  only  must  the 
feature  match,  but  the  surrounding  context  must  also  match. 
The  procedure  should  not  work  for  parts  of  scenes  where  the 
background  of  objects  (context)  changes  drastically  from  one 
view  to  another.  This  is  true  only  at  very  wide  angles  and  close 
range.  Aerial  views  are  mostly  planar,  so  failure  of  matching 
should  not  be  a problem,  nor  has  it  been  in  practice.  The 
process  requires  about  50k  of  36  bit  words  now.  It  is  possible  to 
implement  the  coarse-to-fine  search  strategy  in  a raster  Kan  and 
keep  only  a portion  of  each  image  In  core.  This  would  cut 
memory  size  by  a large  amount,  but  it  has  not  been  done. 

The  program  automatically  determines  the  transform 
between  the  two  views.  Given  corresponding  views  of  five 
points  which  are  non-degenerate  (i.e.  no  colinear  and  planar 
degeneracies)  the  relative  transform  of  the  two  views  can  be 
found.  It  is  not  necessary  to  know  the  position  of  these  points, 
only  two  views  that  correspond.  The  transform  is  determined 
except  for  a scale  factor,  the  length  of  the  stereo  baseline.  That 
does  not  affect  subsequent  matching  of  two  views  using  the 
transform,  and  the  Kale  factor  can  often  be  determined  from 
known  Kene  distances  or  guidance  information.  If  the  Kene  Is 
nearly  flat,  then  certain  parameters  are  ill-deiermtned.  However, 
that  does  not  affect  the  accuracy  of  measuring  heights  using  the 
transform.  In  the  present  form  of  the  camera  transform  solver,  it 
sometimes  encounters  stability  problems  in  degenerate  cases.  It 
may  be  possible  to  improve  that.  If  the  Kene  is  nearly  flat,  then 
a special  simplified  form  of  the  solver  can  be  used. 

The  special  case  version  has  been  used  on  some  Images.  It 
is  much  faster  than  the  full  transform  solver.  Part  of  the  Job  of 
the  transform  solver  is  to  deal  with  mistaken  matches.  The 
procedure  calculates  an  error  matrix  for  each  point  and  iterates 
by  throwing  out  wild  points.  It  calculates  an  error  matrix  from 
which  errors  in  depths  of  point  pairs  are  calculated.  The  solver 
uses  typically  12  points  and  requires  about  300  msec  per  point. 
It  requires  about  20k  of  memory.  It  requires  60S  of  the  time  for 
finding  the  ground  plane.  With  accurate  guidance  information, 
this  operation  would  not  be  necessary.  However,  it  can  be  used 
directly  to  find  the  instantaneous  direction  of  the  vehicle.  As 
mentioned  above,  as  the  vehicle  moves,  points  in  the  image 
appear  to  move  radially  away  from  the  center  which  is  the 
instantaneous  direction  of  the  vehicle.  Three  angles  relate  the 
coordinate  system  of  one  view  with  the  other,  and  two  angles 
specify  the  direction  of  the  instantaneous  direction  of  motion. 


Ground  Plane 

The  camera  transform  model  makes  It  economical  to  make 
a denser  depth  map.  A point  in  one  view  corresponds  to  a ray 
in  space  which  corresponds  to  a line  in  the  other  view.  The 
search  Is  limited  to  this  line,  and  in  addition,  nearby  points 
usually  have  about  the  same  disparity  as  their  neighbors.  Thus, 
search  is  limited  to  a small  Interval  on  a line.  A high  resolution 
correlator  has  been  developed  which  Interpolates  to  the  best 
match,  and  which  calculates  the  precision  of  the  match  based  on 
statistics  of  the  area. 

The  system  then  finds  a best  ground  plane  or  ground 
parabola  to  the  depth  points,  In  the  least  squares  sense.  It  gives 
no  weight  to  points  above  the  ground  plane,  it  expects  many  of 
those.  It  includes  points  below  the  ground  plane  and  near  the 
ground  plane.  Since  points  below  the  ground  plane  may  be  wild 
points,  they  are  edited  out  in  an  iterative  procedure.  Of  course, 
there  may  be  holes.  If  they  are  small,  there  Is  no  problem.  If  the 
hole  Is  big,  It  becomes  the  ground  plane.  The  ground  plane 
finder  requires  5 msec  per  point. 

Edge-Based  Stereo 

Feature-based  stereo  using  edges  increases  the  accuracy 
with  which  boundaries  of  depth  discontinuities  can  be  found  by 
about  a factor  of  25.  It  also  provides  additional  information 
about  surface  markings  which  are  not  available  in  stereo  based 
on  area  correlation.  Feature-based  stereo  is  also  potentially  very 
fast,  although  now  area-based  techniques  are  considerably  faster. 
Edge-based  techniques  have  not  been  developed  very  far.  and 
would  benefit  from  “smart  sensor'  technology.  A new  technique 
has  been  developed  to  use  edge  features  In  stereo.  Edges  are 
linked  along  smooth  curves  In  Sd  (In  the  Image  coordinates  and 
In  depth).  The  new  technique  Is  used  in  the  object  modeling 
and  recognition  modules  of  the  system.  Those  edges  out  of  the 
ground  plane  delimit  bodies,  if  isolated. 

lit  a stereo  pair  of  images  of  an  aircraft  at  San  Francisco 
airport,  the  succession  of  edges  matched  In  stereo  as  a function 
of  height  showed  a separation  In  height  between  wing  tips  and 
and  wing  roots  of  an  LI0I I. 

Vehicle  Location 

Arnold  demonstrated  progress  toward  vehicle  location  and 
identification.  The  system  registered  images  of  a suburban 
parking  lot  and  obtained  the  stereo  camera  model  It  separated 
the  vehicles  from  the  ground  and  succeeded  In  dcKrtblng  the 
projection  of  a car  by  a rectangle  of  approximately  the  right  site 
and  orientation.  The  length  and  width  of  the  car  were  accurate 
to  about  5*  by  inspection.  A sequence  of  steps  are  shown  in 
figures  2 through  i which  lead  to  description  of  the  car  by  a 
rectangular  outline  of  edges  above  the  ground. 

Recognition  of  Complex  Objects 

Nevatia  developed  a system  which  took  depth  maps  of  a 
doll,  a toy  horse,  a glove,  a hammer  and  a ring  using  depth 
maps  from  a laser  trlangulatlon  system  [Nevatia  I97fl  These 
depth  maps  were  described  in  terms  of  general! Md  cones  which 
were  organised  into  partfwhole  structures.  The  system 
recognised  different  views  of  these  objects  with  articulation  of 
limbs  and  some  obKuratlon.  The  system  addressed  Important 
issues  of  representation:  indexing  Into  a subclass  of  similar 
objects  In  visual  memory  Instead  of  matching  against  all  stored 
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models;  determining  generalised  cone  descriptors  from  surface 
descriptors. 
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Figure  } 

Linked  edges  near  ground  (left)  and  above  ground 


Figured  Rectangular  parallelepiped  fit 


Figure  2.  Features  of  interest 
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IMAGE  UNDERSTANDING  RESEARCH  AT  USC 


Ramakant  Nevatia  and  Alexander  A.  Sawchuk 


Image  Processing  Institute 
University  of  Southern  California 
Los  Angeles,  California  90007 


The  following  is  a brief  stannary  of  our  Prate  and  his  associates.  Dr.  O.D.  Faugeras, 

recent  work.  The  details  of  this  work  can  be  and  Mr.  K.  Laws.  On  the  images  tested,  these 

found  in  a forthcoming  technical  report  HI.  techniques  offer  high  probability  of  correct 

classification.  The  measures  are  relatively 
IMAGE  UNDERSTANDING  PROJECTS  simple  and  hardware  implementation  seems 

feasible.  In  another  project,  texture  feature 
We  have  continued  the  development  of  an  extraction  using  the  singular  values  of  a 

overall  system  for  locating  desired  features  in  texture  field  as  vector  components  has  been 

aerial  images.  A model  is  supplied  to  this  investigated, 

system  by  a user  via  an  interactive  dialog 

system.  Our  system  uses  both  edge  and  region  IMAGE  PROCESSING  PROJECTS 

segmentation  techniques,  guided  by  a user 

supplied  model.  Symbolic  descriptions  of  images  Image  processing  projects  during  the  last 

are  matched  with  a symbolic  model  for  locating  six  months  are  concerned  with  image  processing 

desired  features.  Figure  1 shows  an  image  of  system  architecture,  image  restoration,  and 

San  Francisco  area,  fig.  2 shows  a sketch  of  radar  image  formation.  A novel  architecture  for 

user  supplied  model,  and  fig.  3 shows  a partial  performing  two-dimensional  convolution  with  a 

internal  model.  Figure  4 shows  the  identified  miniman  amount  of  hardware  and  fewer  nanerical 

regions  and  linear  segments  and  desired  airports  operations  has  been  developed.  The  technique 

are  located  for  further  analysis  (shown  as  being  involves  repeated  sequential  convolution  with 

bounded  by  rectangles) . Details  of  this  system  snail  generating  kernels  to  approximate  the 

have  been  described  in  previous  progress  reports  results  of  convolution  with  large  kernels. 

12). 

Several  projects  in  image  restoration  have 
Further  progress  has  been  made  on  our  been  active.  An  algorithm  for  computing  the 

techniques  for  detection  of  roads  and  similar  condition  number  of  a Wiener  image  restoration 

structures,  bounded  by  locally  linear  and  operator  as  a means  of  predicting  the  numerical 

locally  parallel  boundary  segments  of  opposite  accuracy  of  the  restoration  process  has  been 

directions.  loproved  results  can  be  obtained  by  developed.  Work  on  image  restoration  for 

bridging  gaps  that  are  caused  by  a single  blurred  images  subjected  to  Poisson  sensor  noise 

missing  edge.  Figure  5 shows  an  aerial  image,  has  been  continued.  The  restoration  technique 

linear  segments  detected  in  it,  detected  roads  is  a moving  window  nonlinear  filter  which  uses 

before  bridging  gaps,  and  roads  after  bridging  maximum  a posteriori  (MU')  and  maximum 

gaps.  Details  of  this  technique  are  described  likelihood  (ML)  estimation  criteria.  The 

in  (3,4).  We  have  also  initiated  development  of  experimental  results  indicate  some  improvement 

a program  to  recognize  structures  that  are  well  over  traditional  linear  Wiener  filters, 

described  by  such  description  techniques.  particularly  in  the  difficult  situation  of 

Initially,  we  are  using  examples  of  airports  signal  - dependent  Poisson  sensor  raise, 

which  are  usefully  characterised  by  the  Additional  theoretical  effort  has  been  directed 

arrangmnts  of  their  runways  and  taxiways.  to  deriving  Craamr-Rao  lower  bounds  on  the 

theoretical  estimation  error.  A new  technique 
We  have  developed  a new  structural  texture  of  blind  deconvolution  (a  posteriori  image 

description  technique  that  attempts  to  find  restoration)  using  algebraic  minimisation  of  an 

repetitive  patterns  in  edges  of  images.  error  criterion  by  choice  of  filter  weights  has 

Periodic  textures,  such  as  in  cities  are  easily  been  developed.  The  experimental  results 

discriminated  and  described.  Many  of  the  indicate  that  the  technique  produces  Improvement 

results  are  preliminary  but  promising.  Details  with  fairly  mull  computing  effort, 

of  this  technique  are  described  in  a separate 

paper  in  these  proceedings  (5).  A final  project  in  the  image  processing 

area  concerns  synthetic-aperture  radar  signal 
processing.  An  analysis  of  errors  associated 
with  data  sampling  in  the  polar  domain  has  been 


Statistical  texture  measurement  techniques 
have  been  further  investigated  by  Dr.  W.K. 
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nvade,  and  a detailed  treatment  of  synthetic 
aperture  radar  systems  from  an  image  process. ng 
context  is  being  completed. 


SHAM  SENSOR  PROJECTS 

The  Hughes  Research  Laboratories  have 
continued  their  work  on  the  development  oi  smart 
sensors  for  image  understanding.  Hughes  is 
presently  completing  construction  of  a new  CCD 
chip  that  performs  the  following  functions: 


3x3  Laplacian 

5xS  median  filter 

Sxb  programable  weight  convolver 

7x7  bipolar  convolver 

26x26  edge  detection  convolver. 


Fig.l.  An  aerial  image  of  San  Francisco  Area. 


SAB  fmbc:ko  tcm 
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Fig . 3 . Internal  Representation  of  Map. 


Fig. 4.  Recognizi  d objects  in  Fig.  1. 


Fig. 2.  A schematic  map. 
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IMAGE  UNDERSTANDING  RESEARCH  AT  CMU: 
A Progress  Report 

Raj  Reddy 

Department  of  Computer  Science 
Carnegie-Mellon  University 
Pittsburgh,  Pa  15213 


INTRODUCTION 

The  primary  objective  of  our  research  effort  Is  to 
develop  techniques  and  systems  which  will  lead  to 
successful  demonstration  of  image  understanding  concepts 
over  a wide  variety  of  tasks,  using  all  the  available  sources 
Of  knowledge.  We  are  focusing  our  attention  on  three  areas 
of  research.  First,  we  are  developing  an  integrated  concept 
demonstration  of  an  image  understanding  system.  The  long- 
term goal  of  this  research  is  to  understand  how  knowledge 
can  be  used  in  the  image  interpretation  process  to  produce 
systems  which  are  2 to  3 orders  ot  magnitude  more  cost- 
effective  than  current  systems.  Over  the  next  three  years 
we  expect  to  Investigate  how  knowledge  of  maps,  size  and 
shape  of  landmarks  such  as  buildings  and  rivers,  and 
contextual  relationships  can  be  used  in  the  interpretation  of 
satellite  images  of  the  Washington,  D.C.  area  and  color 
scenes  of  downtown  Pittsburgh. 

The  second  area  of  research  is  the  development  and 
validation  of  concepts  for  computer  architectures  used  in 
Image  understanding.  The  long-term  objective  of  this 
research  is  to  develop  new  computer  architectures  which 
will  make  low-cost  image  processing  a serious  possibility. 
We  plan  to  evaluate  the  desirability  of  new  processor 
designs  and  new  instruction  sets  for  image  processing 
applications. 

The  third  area  is  the  development  of  intelligent 
Interactive  aids  for  tasks  such  as  photo  interpretation  and 
map  generation.  Many  ol  the  same  techniques  which  are 
useful  in  automatic  interpretation  are  applicable  in  this  area, 
except  that  in  this  case  the  human  being  provides  the  goal 
direction.  The  availability  ol  intelligent  assistants  capable  of 
examining  large  image  data  bases  and  retrieving  desired 
information  is  expected  to  significantly  improve  human 
productivity  in  tasks  such  as  photo  interpretation  and 
cartography. 

The  following  is  a brief  summary  of  our  work  over  the 
lest  six  months. 


knowledge  to  different  portions  of  the  errorful  signal  data. 
One  technique  Is  to  apply  the  Locus  search  technique  to  the 
backtrace  to  determine  the  “best*  label. 

We  have  begun  to  experiment  with  the  use  of  a 
modified  relaxation  technique  within  the  ARGOS  Image 
Understanding  System  as  an  alternative  to  the  LOCUS 
search.  We  use  the  sa<ie  optical  match,  contrast,  location, 
and  adjacency  knowledge  producing  results  comparible  to 
LOCUS  but  at  a factor  of  2 to  4 loss  of  speed.  A report  un 
this  work  is  forthcoming  in  "An  Experiment  with  Search 
Strategies  for  an  Image  Understanding  System",  (Smith, 
1979). 


IMAGE  FEATURE  ANALYSIS  AND  SEGMENTATION 

A new  approach  to  deriving  three-dimensional  surface 
orientation  from  image  textural  properties  is  described  in 
"Shape  from  Texture:  A Computational  Paradigm"  (Kender, 
1979),  in  this  volume.  Introduced  is  a new  representational 
and  computational  tool,  the  normalized  textural  property 
map,  which  unites  and  exploits  a large  class  of  low-level 
image  heuristics.  An  example  of  an  application  of  the 
paradigm  to  an  abstract  textured  image  is  given,  and  the 
relation  of  this  work  lo  existing  work  on  shape  is  discussed. 

We  are  continuing  to  study  the  effective  use  of 
knowledge  in  image  segmentation.  The  KIWI  segmentation 
program  (Shafer  and  Kanade,  in  prep.)  has  incorporated  a 
fast  algorithm  for  extracting  descriptions  of  regions 
resulting  from  a possible  segmentation.  By  analyzing  these 
descriptions,  noise  elimination  can  be  performed  without  the 
use  of  global  smoothing  techniques.  The  speed  of  this 
process  allows  KIWI  to  examine,  in  parallel,  several  possible 
segmentations  based  on  different  image  features,  and  to 
select  the  segmentation  which  results  in  the  most  viable 
region  configuration. 

The  region  extraction  algorithm  used  in  the  KIWI 
program  is  being  extended  to  perform  other  related  tasks. 
Shafer  is  using  this  procedure  to  eliminate  noise  regions, 
and  has  found  it  faster  than  "smoothing"  techniques  which 
accomplish  the  same  task.  The  procedure  is  less  sensitive 
to  the  size  of  the  image  than  smoothing,  and  allows  more 
flexible  definitions  ot  "noise".  The  phenomenon  of 
"degenerate  histograms"  has  been  dealt  with  by  the  same 
algorithm,  and  we  are  able  to  identify  "busy"  (textured) 
areas  of  an  image  during  segmentation  without 
preprocessing.  We  have  solved  the  problem  of  the  large 
data  tables  required  by  this  technique,  and  are  extending  il 
to  gather  additional  statistics  about  the  regions  processed. 


KNOWLEDGE  REPRESENTATION  A NO  SEARCH 

3-D  MODELING 

During  the  past  six  months  we  have  concentrated  on 

the  detailed  performance  analysis  of  the  ARGOS  Image  It  Is  a common  experience  lor  us  that,  given  a single 

Understanding  System  (Rubin,  1978)  using  hand  segmented  2-dimensional  picture  of  an  object,  we  have  one  (or  a few) 

data.  Problems  with  inaccuracies  in  the  camera  model  used  detinite  idea(s)  about  its  3-D  shape,  in  spite  of  the  fact  that 

to  generate  the  adjacency  netork  and  omissions  in  the  • large  number  of  possible  shapes  exist  which  produce  the 
knowledge  network  were  identified  and  corrected.  Further  same  picture.  This  fact  indicates  that  we  use  some 

work  is  being  done  on  mechanisms  to  resolve  pointer  assumptions  or  knowledge  about  the  objects  and  about  the 

conflict  in  the  backtrace  of  the  LOCUS  search.  These  Image  formation, 

conflicts  arise  frequently  due  to  the  maintenance  of  multiple 

Interpretations,  each  generated  by  the  application  of  Kanade  (Kanade,  1979)  has  been  working  to  identify 
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some  of  these  assumptions,  mostly  in  the  geometrical 
aspects,  by  demonstrating  how  the  theory  and  techniques 
which  exploit  such  assumptions  can  provide  a systematic 
shape-recovery  method.  The  method  consists  of  two  parts: 
qualitative  shape  recovery  and  quantitative  shape  recovery. 
For  the  qualitative  shape  recovery  we  use  a model  of  tlie 
Origami  world  (Kanade,  1978),  together  with  edge  profiles  of 
lines  taken  across  the  lire  in  the  image  in  order  to  constrain 
line  labels  In  the  search  of  plausible  interpretations. 

For  the  quantitative  shape  recovery,  we  adopt  a 
technique  of  mapping  image  regularities  (in  particular,  the 
parallelism  of  lines  and  the  skewed  symmetry)  into  shape 
constraints,  which  is  developed  in  (Kanade  and  Kender, 
1979).  Actual  shape  recovery  from  a single  image  is 
demonstrated  for  the  scenes  of  an  object  such  as  a box  and 
a chair.  Given  an  image,  the  shape-recovery  process 
generates  a 3-0  shape  description  of  objects  in  terms  of 
plane  surfaces,  and  the  description  is  supplied  to  a display 
program  which  can  synthesize  Images  of  the  same  object  as 
we  would  see  it  from  other  view  directions. 


INTERACTIVE  AIDS 

We  are  continuing  with  the  integration  of  map 
knowledge  of  the  Washington,  D.C.  area  into  our  system. 
The  map  knowledge  consisted  of  a terrain  (elevation) 
database  and  cultural  features  such  as  rivers,  major 
buildings,  forests  and  roads.  We  plan  to  apply  this 
knowledge  in  a system  which  will  match  satellite  and  aerial 
photographs  to  the  terrain  model  and  extract  information 
from  the  images  using  the  cultural  feature  data. 


ARCHITECTURES  FOR  IMACE  PROCESSING 

SPARC,  the  high  speed  processor  being  jointly 
designed  by  Control  Data  and  CMU,  has  completed  the 
design  phase  and  begun  layout  and  fabrication.  Currently 
the  four  major  component  boards  which  contain  the  bulk  of 
the  custom  LSI  circuitry  are  being  routed  and  fabricated. 
We  expect  that  the  processor  will  be  delivered  to  CMU  In 
the  fall  of  1 979.  Current  gate  level  simulations  indicate  that 
Instruction  speeds  in  the  order  of  20ns  can  be  expected. 

Researchers  at  CMU  have  already  designed  several 
prototype  NMOS  LSI  circuits  utilizing  graphics  software 
running  under  the  UNIX  operating  system.  Work  is 
underway  to  complete  a design  laboratory  which  will  allow 
top  down  design  of  VLSI  circuits,  as  well  as  provide  post- 
fabrication packaging  and  testing  facilities.  The  laboratory 
is  intended  to  allow  computer  scientists  with  a minimal 
understanding  of  solid-state  physics  and  1C  design  to  rapidly 
produce  working  circuits.  A number  of  special  purpose 
chips  are  expected  to  be  designed  to  implement  common 
Image  understanding  algorithms,  such  as  edge  detectors  and 
smoothing  operators. 

Our  collaboration  with  Texas  Instruments  (Eversole  et. 
at.,  1978)  to  jointly  design  and  develop  an  ait-digital 
programmable  VLSI  chip  set  for  several  low  level  vision 
operations  has  begun  to  result  In  breadboard  designs  for 
several  Important  operators:  a programmable  sum  of 
products  operator  and  a 5x5  median  operator. 


CONCLUSION 

While  the  primary  emphasis  continues  to  be  In 
effective  use  of  knowledge  in  the  image'  interpretation 
process,  the  research  at  CMU  is  tempered  by  the  realization 
that  we  must  also  pay  adequate  attention  to  other  relevant 
aspects  such  as  computer  architecture,  software  design, 
image  databases,  performance  analysis  and  perceptual 
psychology.  We  continue  to  have  modest  efforts  in  each  of 
these  areas. 
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Current  activities  on  this  project  are  reviewed 
under  the  following  headings: 

1.  Inage  modelling  and  preprocessing 

2.  Edge  detection  and  linking 

3.  Segmentation  and  texture  analysis 

4.  Pattern,  shape,  and  structure  matching 

5.  Hlerarachlcal  region  representation 


INTRODUCTION 

This  project  is  concerned  with  the  study  of 
advanced  techniques  for  the  analysis  of  recon- 
naissance Imagery.  It  Is  being  conducted  under 
Contract  DAAG-53-76C-0138  (DARPA  Order  3206) , mon- 
itored by  the  U.S.  Army  Night  Vision  Laboratory, 

Ft.  Belvolr,  VA  (Dr.  George  Jones).  The  Westing- 
house  Systems  Development  Division,  as  a subcon- 
tractor, Is  investigating  hardware  implementation 
of  the  techniques  being  developed  by  Maryland, 
particularly  In  the  area  of  relaxation;  their  ef- 
forts are  reviewed  In  a separate  paper  in  these 
Proceed lngs. 

The  preceding  phase  of  this  project  [1]  was 
concerned  with  tactical  target  detection  on  FLIR 
imagery.  The  current  phase  deals  with  a wider 
variety  of  Imagery,  and  Includes  work  on  Image  mod- 
elling and  preprocessing;  edge  detection  and  link- 
ing; segmentation  and  texture  analysis;  and  match- 
ing of  patterns,  shapes,  and  relational  structures. 
Particular  emphasis  Is  being  placed  on  the  use  of 
convergent  evidence  and  cooperative  computation 
("relaxation")  in  segmentation  and  matching.  A 
recently  Initiated  effort  deals  with  hierarchical 
region  representations  ("quadtrees")  and  their 
uses  In  Image  analysis. 

Work  done  during  the  first  six  months  of  the 
current  phase  was  summarized  In  [2-3],  and  will  be 
mentioned  here  only  briefly,  for  background  pur- 
poses. This  status  report  deals  primarily  with 
the  work  done  during  the  past  six  months.  One 
topic.  Involving  the  use  of  relaxation  In  segment- 
ation, has  been  singled  out  for  more  detailed 
discussion  In  a separate  paper  In  these  Proceedings. 


IMAGE  MODELLING  AND  PREPROCESSING 
Mosaic  models  for  Images 

Under  an  AFOSR  grant  [4]  , extensive  work  has 
been  done  on  a class  of  "mosaic  Image  models"  based 
on  random  geometric  processes.  In  cell  structure 
models,  the  given  planar  region  is  tessellated  into 
cells,  and  "colors"  (e.g.,  gray  levels)  are  inde- 
pendently assigned  to  these  cells  according  to 
specified  probabilities.  Examples  of  such  models 
are  the  Poisson  line  model,  in  which  the  plane  is 
tessellated  by  random  lines;  the  occupancy  model, 
in  which  random  points  in  the  plane  define 
"Dirichlet  cells"  consisting  of  the  parts  of  the 
plane  that  are  closer  to  each  point  than  to  any  of 
the  others;  and  the  Delaunay  model,  in  which  the 
tessellation  is  obtained  by  joining  all  pairs  of 
the  random  points  whose  Dirichlet  cells  are  adja- 
cent . In  coverage  (or  "bombing")  models,  randomly 
oriented  figures  ere  placed  at  random  points  in 
the  plane,  and  colors  are  assigned  to  the  figures 
and  background  according  to  specified  probabili- 
ties. An  introduction  to  such  models  can  be  found 
in  [5]. 

Images  generated  by  mosaic  models  consist  of 
connected  components  each  having  a constant  gray 
level.  Analytical  results  have  been  obtained  on 
the  expected  values  of  various  geometrical  proper- 
ties of  these  components,  including  the  expected 
area  and  width  (-length  of  intercept  by  a random 
line)  of  a component,  the  expected  number  of  comp- 
onents, and  the  expected  total  perimeter  In  the 
mosaic.  Results  have  also  been  obtained  on  the 
Joint  gray  level  probability  as  a function  of  sep- 
aration in  these  mosaics,  from  this,  the  autocor- 
relation, edge  density,  and  varlogram  (-expected 
squared  gray  level  difference)  can  be  derived.  The 
details  can  be  found  in  [4] . 

The  patterns  generated  by  mosaic  models  are 
much  simpler  than  real  Images;  more  realistically, 
the  cells  should  be  blurred  and  noisy.  However, 
if  we  compare  the  predicted  values  of  various  prop- 
erties for  the  models  with  those  for  suitable  real 
images,  we  find  that  the  predictions  obtained  from 
the  "plausible”  model  are  much  better  fits  to  the 
real  data  than  those  obtained  from  other  models. 

For  example,  consider  the  picture  of  marble 
(Brodatz,  Plate  62)  shown  in  Figure  1',  which  re- 
sembles the  patterns  generated  by  a Poisson  line 
model.  Table  1 shows  the  observed  values  of 


total  perimeter  and  of  expected  component  width 
for  this  picture  thresholded  at  25  (on  a 0-63 
grayscale);  these  values  are  not  very  sensitive  to 
the  choice  of  threshold.  Predicted  values  based 
on  the  Poisson  line,  occupancy,  and  Delaunay 
models  are  also  shown.  We  see  that  the  Possion 
line  predictions  are  much  better  fits  to  the 
observed  values.  The  details  of  these  experiments 
can  be  found  In  [6], 

In  principle,  one  could  use  the  predicted 
total  perimeter  values  to  predict  the  expected 
edge  strength  for  an  image,  by  averaging  the  ex- 
pected edge  strength  in  the  Interiors  of  cells  (as 
obtained  from  a model  for  the  gray  level  population 
in  a cell)  with  that  on  intercell  borders  (as  ob- 
tained from  the  mosaic  model  Itself).  However, 
such  predictions  would  not  be  very  accurate  for 
several  reasons:  (1)  the  model  assumes  that  the 
transitions  between  cells  are  sharp;  (2)  the 
amount  of  border  (i.e.,  the  total  perimeter)  can- 
not be  predicted  very  accurately.  (For  a more 
detailed  discussion  of  these  problems,  see  [6]). 
Further  work  on  the  application  of  these  models  to 
real  Images  is  needed. 


e2.  Selective  averaging  2i  P'  is  the  average 
of  N(P)  provided  the  edge  strength  at  P 
is  less  than  t;  otherwise,  P*  Is  the 
average  of  the  two  neighbors  in  the  dir- 
ection along  the  edge. 
e3.  Selective  averaging  3:  Analogous,  but 

using  four  directional  edge  masks,  rather 
than  differences  In  two  perpendicular 
directions,  to  determine  edge  strength  and 
direction. 

f.  Maximum  homogeneity  smoothing:  Five  4xA 
neighborhoods  surrounding  P are  used;  P' 

Is  the  average  of  that  neighborhood  which 
is  most  homogeneous. 

g.  Neighbor  weighting  (1,2):  P's  is  a weighed 
average  of  N(P).  (The  definitions  of  the 
weights  are  somewhat  complicated,  and  will 
not  be  reproduced  here.) 

h.  Weighted  averaging:  P'  is  a weighted 
average  of  P and  the  mean  of  N(P),  where 
the  weight  given  to  P depends  on  how  high 
the  local  image  variance  is  relative  to 
the  overall  image  variance.  (This  method 
was  not  Iterated.) 


Many  Image  processing  and  segmentation  tech- 
niques assume  that  ideal  Images  are  approximately 
piecewise  constant,  i.e.,  are  composed  of  rela- 
tively "flat"  regions  separated  by  relatively 
steplike  edges.  (Note  that  the  Ideal  Images  gen- 
erated by  mosaic  models  do  have  these  properties.) 
This  assumption  provides  a basis  for  conventional 
methods  of  Image  segmentation  by  analysis  of  the 
gray  level  histogram  or  other  pixel  feature  space 
(the  regions  should  give  rise  to  sharp  histogram 
peaks  or  compact  feature  value  clusters)  and  for 
edge  detection  (the  region  borders  should  give 
rise  to  stronger  edge  values  than  the  Interiors). 


In  practice.  Images  are  noisy,  and  It  is  not 
obvious  how  to  approximate  them  by  piecewise  con- 
stant functions.  Noise  can  be  reduced  by  local 
averaging,  but  this  blurs  the  edges  between  the 
regions,  which  Is  also  undesirable.  A variety  of 
nonlinear  noise  cleaning  schemes  have  been  devised 
that  smooth  noise  without  blurring  edges.  In  [7], 
a number  of  these  schemes  are  compared.  Their 
definitions  are  summarized  below;  In  each  method, 
a new  gray  level  P'  for  point  P Is  computed  as  a 
function  of  the  gray  levels  in  its  3-by-3  neigh- 
borhood N(P): 

a.  Mode  filtering:  P'  Is  the  most  frequently 
occurring  gray  level  in  N(P). 

b.  Median  filtering:  P'  Is  the  median  gray 
level  In  N(P) . 

c.  P'  is  obtained  by  averaging  P with  the 
k points  of  N(P)  that  are  closest  to  It 

in  gray  level. 

d.  Gradient  smoothing:  P'  Is  the  average  of 
those  points  of  N(P)  that  have  lower 
gradient  values  than  P. 

el.  Selective  averaging  1:  P’  Is  the  average 
of  N(P)  provided  P differs  from  at  least 
6 of  its  neighbors  by  at  least  t. 


For  the  detailed  definitions  of  the  methods,  see 

[7] .  A simple  Kalman  filtering  scheme  was  also 
applied  to  the  sarc  lrtges.  The  best  methods  were 
median  filtering,  gradient  smoothing,  and  the  first 
neighbor-weighting  method.  Evaluations  were  based 
on  subjective  Judgment  [7],  on  the  improvement  in 
the  Image's  histogram  (i.e.,  emergence  or  clearer 
separation  of  peaks) , and  on  mean  squared  error 

[8] . 

The  best  few  methods  (the  three  just  mentioned, 
and  also  the  E5  method)  were  also  applied  to  a 
color  image  [8],  Separate  noise  cleaning  on  the 
individual  R,  G,  B color  components  was  found  to  be 
somewhat  more  effective  than  "vector"  noise  clean- 
ing In  the  three-dimensional  color  space;  but  wheu 
a different  color  coordinate  system  (U,V,W)  was 
used,  the  reverse  was  true.  Details  of  the  results 
and  their  evaluation  can  be  found  In  [8]. 

A number  of  new  Image  smoothing  techniques 
have  also  been  tested  [9];  several  of  these  give 
results  comparable  In  quality  to  those  obtained 
using  the  best  methods  tested  In  [7].  One  approach 
makes  use  of  half-neighborhoods;  it  chooses  three 
consecutive  neighbors  of  P whose  average  gray  level 
differs  maximally  from  that  of  the  other  five,  and 
averages  P with  those  five.  Another  approach  uses 
a weighted  average  of  the  neighbors  In  which  a 
neighbor's  weight  depends  on  how  different  it  Is 
from  P;  however,  this  scheme  does  not  smooth  very 
effectively.  Still  another  idea  Is  to  average  P 
with  all  of  N(P)  If  It  differs  from  the  mean  of 
N(P)  by  less  than  the  standard  deviation,  and  to 
average  P with  only  Its  four  most  similar  neighbors 
(i.e.,  E4)  otherwise.  The  details  of  these  methods, 
and  examples  of  results,  can  be  found  In  (9). 

With  any  point  P of  an  image  I we  can  associ- 
ate a gray  level  probability,  ir(P) , estimated  from 
the  Image's  histogram.  If  we  regard  it(P)  as  a 
(rescaled)  gray  level,  the  resulting  Image  may  be 


called  Che  probability  transform  of  1.  (Other 
types  of  probability  transforms  can  be  defined 
based  on  the  probabilities  of  local  properties 
other  than  gray  level,  or  on  joint  probabilities; 
the  details  will  not  be  given  here.  An  example 
using  joint  probabilities  is  the  "texture  trans- 
form" of  Haralick.)  Noise  cleaning,  thresholding, 
edge  detection,  and  other  standard  processing 
techniques  can  be  applied  to  a probability  trans- 
form rather  than  to  the  original  image.  Experi- 
ments along  these  lines  are  In  progress  (e.g.  , [9] 
gives  some  interesting  noise  cleaning  results)  and 
will  be  desribed  in  a forthcoming  technical  report 


defector,  based  on  3x3  neighborhoods.  Is  used, 
these  slopes  are  quite  uncertain.  Moreover,  this 
leads  to  an  even  greater  uncertainty  in  estimating 
distance  if  the  given  edge  point  la  far  away  from 
the  point  where  the  line  through  It  comes  closest 
to  the  origin. 

The  usefulness  of  the  Hough  transform  can  be 
Improved  by  Increasing  the  accuracy  of  the  edge 
slope  estimates.  This  can  be  done  by  iteratively 
reestimating  the  slope  and  magnitude  of  edge  re- 
sponses at  each  point  based  on  the  values  of  the 
current  estimates  at  nearby  points. 


The  details  of  such  an  edge  reinforcement 
process  are  described  in  [11].  Figure  2 shows  an 
example  Involving  an  aerial  photograph  of  an  air- 
port. 


EDGE  DETECTION  AND  UNKING 


The  Roberts  and  Hueckel  edge  detectors 


Strip  detection 


The  classical  approach  to  edge  detection  in- 
volves convolving  a set  of  masks,  representing 
steps  in  various  orientations,  with  the  image,  and 
defining  the  edge  strength  at  a point  to  be  the 
max  of  the  convolved  values.  The  simplest  scheme 
of  this  type,  due  to  Roberts,  uses  the  masks 

i?  and  ® „,  corresponding  to  the  + 45*  directions 
Thus  in  the  2-by-2  image  neighborhood  CD>  the 
Roberts  edge  strength  is  max( | A— D | , | B— C | ) . 


Iterative  reinforcement  can  also  be  used  to 
enhance  parallel-sided  strips,  l.e.,  pairs  of 
"antiparallel"  edges.  An  example,  involving  vert- 
ical edges  only,  is  given  In  [12].  We  assume  that 
the  vertical  edge  strength  is  a signed  quantity, 
with  positive  values  corresponding  to  (low, high) 
transitions,  and  negative  values  to  (high, low). 

A cross-shaped  neighborhood  of  each  edge  point  P 
is  examined.  The  edge  strengths  along  the  vertical 
arm  of  the  cross  provide  increments  to  the  edge 
strength  at  P.  Thus  edges  along  this  arm  having 
the  same  sense  as  that  at  P reinforce  P,  while 
those  having  the  opposite  sense  weaken  P.  At  the 
same  time,  the  edge  strengths  along  the  horizontal 
arm  provide  decrements  to  the  strength  at  P;  thus 
edges  along  this  arm  having  the  same  sense  as  P 
weaken  P,  while  those  having  the  opposite  sense 
reinforce  P.  As  Figure  3 illustrates,  a few 
iterations  of  this  process  serve  to  greatly  en- 
hance vertical-sided  strips  while  weakening  noise 
edge  responses.  The  details  of  the  process,  and  of 
variations  on  it,  are  described  in  [12], 


A more  sophisticated  edge  detection  technique, 
due  to  Hueckel,  finds  a best-fitting  step  edge  to 
a given  image  neighborhood,  and  takes  the  edge 
strength  to  be  the  height  of  this  step.  Hueckel 
used  a relatively  large  neighborhood,  and  deter- 
mined the  best  fit  by  expanding  the  step  edge  and 
the  neighborhood  in  terms  of  a set  of  nine  basis 
functions.  Various  authors  (e.g.,  Nevatia, 
O'Gorman,  Hero  and  Vassy,  Hummel)  have  investigated 
simplifications  of  Hueckel 's  approach. 


The  simplest  possible  version  of  Hueckel 's 
method  uses  a 2-by-2  neighborhood  and  three  basis 

functions,  namely  | _j_|,  and  When  the 

best-fitting  step  edge  is  determined  using  this 
basis,  it  turns  out  that  the  magnitude  of  this 
step  is  precisely  the  Roberts  edge  strength.  The 
details  of  the  proof  can  be  found  in  [10]. 


SEGMENTATION  AND  TEXTURE  ANALYSIS 


Segmentation  by  relaxat ion 


Extensive  work  has  been  done  at  Maryland  and 
elsewhere  on  the  use  of  iterative  probability 
adjustment  schemes  ("relaxation")  for  classifica- 
tion in  the  presence  of  constraints.  In  parti- 
cular, this  approach  has  been  applied  to  most  of 
the  standard  methods  of  image  segmentation  by  pixel 
classification.  A package  of  programs  has  been 
developed  to  facilitate  experimentation  with  relax- 
ation methods  at  the  pixel  level;  it  will  be 
described  in  a forthcoming  technical  report. 


Straight  edge  enhancement 


There  are  a number  of  standard  methods  of  de- 
tecting straight  edges  (or  lines)  in  an  image.  One 
of  the  most  commonly  used  of  these  is  the  Hough 
transform,  in  which,  for  each  edge  point  P,  we 
estimate  the  slope  and  distance  from  the  origin  of 
the  straight  line  through  P.  In  this  way,  the 
edge  points  are  mapped  into  points  in  (slope, 
distance)  space.  Evidently,  sets  of  collinear 
edge  points  map  into  approximately  the  same  point 
in  this  "Hough  space";  thus  we  can  detect  straight 
edges,  whether  continuous  or  broken,  by  looking 
for  high  concentrations  of  points  in  Hough  space. 


A recent  application  of  relaxation  to  segment- 
ation deals  with  thresholding,  i.e.,  with  the 
classification  of  the  pixels  into  "light"  and  "dark' 
classes.  Initially,  estimated  probabilities  of 
membership  in  these  classes  are  computed  for  each 
pixel  on  the  basis  of  its  gray  level  (i.e.,  propor- 
tional to  the  distances  of  its  gray  level  from  the 
ends  of  the  gray  level  range) . These  probabilities 
are  then  iteratively  adjusted  based  on  the  neigh- 


A problem  that  arises  in  the  use  of  Hough 
transforms  is  the  uncertainty  in  estimating  the 
slopes  of  the  edge  points.  When  a standard  edge 
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boring  probabilities,  light  reinforcing  light  and 
dark  dark.  (This  process  Is  analogous  to,  but  not 
the  same  as.  Image  smoothing  by  Iterated  local 
averaging.  Rather  than  smoothing  the  picture,  It 
tends  to  drive  light  gray  levels  toward  white  and 

dark  ones  toward  iTTIII'f  ) ftn  nf  the  oper- 

atlon  of  this  process  is  shown  in  Figure  4;  it  will 
be  described  in  more  detail  In  a forthcoming  tech- 
nical report.  Other  versions  of  this  approach, 
in  which  the  classes  need  not  be  "light”  and 
"dark,"  but  can  be  defined  appropriately  for  the 
given  picture,  are  also  under  study. 

Relaxation  has  also  been  applied  to  edge  (or 
curve)  enhancement.  Here  edge  probabilities  are 
Initially  assigned  to  each  pixel  based  on  the 
relative  responses  of  edge  masks  in  various  orien- 
tations, and  a "no  edge"  probability  is  also 
assigned  based  on  rhelr  absolute  responses.  These 
probabilities  are  then  adjusted  Iteratively  based 
on  the  neighboring  probabilities.  In  particular, 
edge  reinforces  edge  to  the  extend  that  they 
smoothly  continue  one  another;  no  edge  reinforces 
no  edge;  edge  reinforces  -.o  edge  alongside  It,  and 
competes  with  no  edge  at  its  end. 

The  light/dark  and  edge/no  edge  relaxation 
processes  can  also  be  combined;  e.g.,  edge  rein- 
forces "light"  alongside  It  on  Its  light  side,  and 
reinforces  "dark"  alongside  it  on  its  dark  side, 
and  so  on.  This  combination  gives  better  results 
than  either  of  the  two  processes  operating  indiv- 
idually. Experiments  using  this  approach  are 
described  In  a separate  paper  in  these  Proceedings. 

Using  the  combined  relaxation  process  for 
Image  segmentation  Is  analogous  to  using  "conver- 
gent evidence,"  Involving  both  gray  level  and  edge 
value,  in  segmentation.  In  the  "Superslice” 
convergent-evidence  scheme,  developed  under  an 
earlier  phase  of  this  project,  thresholding  and 
edge  detection  are  applied  separately;  when  both 
give  consistent  results,  we  say  that  an  object  has 
been  detected.  In  the  relaxation  approach,  gray 
level  and  edge  evidence  Interact  from  the  begin- 
ning; if  they  are  consistent,  they  will  mutually 
reinforce,  making  object  detection  easy.  This  has 
the  potential  advantage  that  it  does  not  involve 
information-destructive  steps  (thresholding,  edge 
maximum  selection),  as  Superslice  did. 

Texture  analysis:  gray  level  cooccurrence  based 
on  edge  maxima 

One  of  the  most  effective  current  approaches 
to  texture  classification  employs  properties  de- 
rived from  second-order  gray  level  probability 
densities.  For  any  displacement  ji,  let  p^(z,w) 
be  che  probability  that  a pair  of  points  at  sep- 
aration £ have  gray  levels  z and  w.  The  matrix 
whose  (l,j)  entry  is  Pg(l,j),  where  1,J  run 
through  the  possible  gray  levels,  is  called  a 
gray  level  cooccurrence  matrix.  Statistics  com- 
puted from  such  matrices  have  proved  to  be  very 
useful  as  textural  properties. 

Davis  has  recently  suggested  a generalized 
approach  to  cooccurrence  analysis  In  which  Joint 


probabilities  are  estimated  only  for  pairs  of 
"special"  points,  e.g.,  local  feature  points. 
Moreover,  Instead  of  Joint  gray  level  probabili- 
ties, we  can  use  joint  probabilities  of  other 
properties  associated  with  a feature  point.  The 
point  pairs  need  not  be  at  a fixed  separation; 
we  can  define  £ at  each  feature  point  based  on  the 
properties  of  that  point.  For  example,  we  can 
use  edge  strength  maxima  as  feature  points;  we 
can  take  6^  to  be  in  the  direction  along  or  across 
the  edge  at  the  given  point;  and  we  can  estimate 
cooccurrence  probabilities  for  pairs  of  edge  slopes 
obtained  in  this  way.  This  yields  what  we  may 
call  a slope  cooccurrence  matrix.  The  slope  pair 
8tatl.”.i  )cs  should  provide  useful  information  about 
the  sizes  and  shapes  of  the  texture  elements  de- 
fined by  the  edge  maxima. 

Alternatively  [13],  we  can  define  texture 
properties  based  on  the  gray  level  cooccurrences 
for  pairs  of  points  one  or  both  of  which  are  edge 
strength  maxima.  Specifically,  at  each  edge  max- 
imum P we  look  for  another  one  (within  some 
bounded  distance),  say  Q,  In  the  direction  along 
or  across  the  edge,  and  record  the  gray  levels  at 
P and  Q;  the  numbers  of  such  pairs  having  given 
gray  levels  provide  estimates  of  the  desired  joint 
probabilities.  Alternatively,  we  simply  move  a 
given  distance  6 from  P in  the  direction  along  or 
across  the  edge,  and  pair  the  gray  level  found 
there  with  that  at  P.  The  resulting  gray  level 
statistics  will  also  depend  on  the  texture  element 
sizes  and  shapes. 

Pilot  experiments  [13]  indicate  that  texture 
properties  derived  in  these  ways  are  not  always  as 
effective  for  texture  discrimination  as  the  clas- 
sical gray  level  cooccurrence  statistics.  For 
example.  Figures  5a  and  6a  show  three  samples  each 
of  three  terrain  types  and  four  textures  from 
Brodatz*  album,  and  Figures  5b  and  6b  show  the 
corresponding  edge  maxima.  Note  that  two  of  the 
terrain  types  are  hard  to  discriminate  based  on 
Che  pattern  of  edge  maxima  alone.  Not  surprising- 
ly, properties  derived  from  the  maxima  are  usually 
not  effective  in  separating  these  classes,  as  seen 
in  Table  2.  The  maxima-based  properties  are  more 
effective  in  discriminating  the  Brodatz  textures, 
as  Table  3 shows. 


PATTERN,  SHAPE,  AND  STRUCTURE  MATCHING 
Feature  pattern  matching 

Correlation-based  methods  of  matching  two 
images  of  the  same  scene  are  quite  sensitive  to 
relative  distortion.  One  way  to  overcome  this  is 
to  extract  a set  of  feature  points  from  each 
image,  and  match  the  resulting  feature  patterns. 
The  computational  cost  of  feature  point  matching 
grows  only  with  the  numuer  of  feature  points,  not 
with  the  picture  size.  Good  match  peaks  can  be 
obtained  even  if  many  of  the  points  detected  in 
one  image  are  absent  from  the  other  one  and  vice 
versa,  and  the  matching  process  can  be  made  in- 
sensitive to  appreciable  amounts  of  distortion. 
Experiments  using  this  approach  are  described  in 


18 


[14-15];  some  of  these  were  summarized  at  the  pre- 
ceding workshop  [16],  Fuzzy  relaxation  methods 
can  also  be  used  as  an  aid  in  matching  feature 
patterns,  as  described  in  [16,17]. 

Relational  structure  matching 

Relaxation  methods  are  also  quite  effective 
for  matching  relational  structures,  e.g.,  for 
finding  matches  between  a "model  graph”  and  sub- 
graphs of  a given  "scene  graph".  Experiments 
using  this  approach  are  described  in  [18-19],  and 
were  summarized  at  the  preceding  workshop  [16]. 

For  symbolically  labelled  graphs,  where  exact 
matching  is  required,  "discrete  relaxation"  is 
appropriate  [18];  for  numerically  labelled  graphs, 
where  the  matching  is  quantitative,  one  can  use 
fuzzy  relaxation  [19]. 

Shape  matching 

A method  of  applying  relaxation  to  ambiguously 
segmented  one-dimensional  patterns  has  been  deve- 
loped under  an  NSF  grant  [20],  and  has  been  applied 
to  the  disambiguation  of  handwritten  words  [21]. 
(This  application  makes  use  of  a new  probabilistic 
relaxation  formula  [22]  which  can  be  easily  ex- 
tended to  allow  interactions  between  triples  of 
nodes  rather  than  pairs;  this  allows  trigram  fre- 
quencies to  be  used  in  defining  node  label  compat- 
ibilities.) Preliminary  experiments  have  been 
conducted  on  the  application  of  this  approach  to 
segmentation  and  labeling  of  the  boundary  of  a 
shape. 

An  example,  involving  an  airplane  shape.  Is 
shown  in  Figure  7.  (A  report  on  this  work  is  in 
preparation;  we  give  here  only  a sketchy  descrip- 
tion.) In  this  example,  the  boundary  arc  between 
any  pair  of  negative  (-concave)  curvature  peaks, 
not  necessarily  consecutive,  was  taken  to  be  a 
segment  provided  its  length  was  between  10Z  and 
50Z  of  the  perimeter.  Initial  probabilities  for 
the  labels  "nose”,  "tail",  "Lwlng",  and  "Rwing" 
were  assigned  to  each  segment  based  on  some  simple 
shape  features.  Table  4 lists  the  28  segments  and 
the  initial  probabilities. 

In  the  relaxation  process,  a label  was  dropped 
if  it  was  not  part  of  a compatible  consecutive 
triple.  Table  5 shows  the  numbers  of  segments 
(having  it  least  one  surviving  label)  and  labels 
remaining  after  each  iteration.  After  three 
iterations,  the  process  had  stabilized,  and  each 
node  had  only  one  label,  as  shown  in  Table  6.  All 
cycles  of  length  4 were  found  in  the  resulting 
graph,  and  their  compound  probabilities  were  com- 
puted, as  shown  in  Table  7.  There  were  only  six 
such  cycles,  and  the  three  with  high  probabilities 
correspond  to  approximately  the  same  segmentation, 
which  is  the  plausible  one. 


HIERARCHICAL  REGION  REPRESENTATION 

Region  representation  plays  a key  role  in  image 
and  scene  analysis,  computer  cartography,  and 
computer  graphics.  There  are  a variety  of 


approaches  to  representing  regions,  based  on  their 
boundaries  or  their  "skeletons";  seme  of  these  are 
reviewed  in  the  following  paragraphs  Recently,  a 
tree  representation  has  been  proposed  which  offers 
a number  of  advantages;  it  is  also  described  below. 
Since  each  type  of  representation  has  its  own 
advantages,  it  becomes  desirable  to  develop  effi- 
cient methods  of  converting  from  one  representation 
to  another. 

The  boundary  of  a simply-connected  region  is 
specified,  relative  to  a given  starting  point,  by  a 
sequence  of  "unit"  vectors  in  the  principal  direc- 
tions. Such  "chain  codes"  provide  a very  compact  re- 
gion representation,  and  make  it  easy  to  detect  fea- 
tures of  the  region  boundary,  such  as  sharp  turns 
("corners")  or  concavities.  On  the  other  hand,  it 
is  harder  to  determine  properties  such  as  elongat- 
edness from  a chain  code,  and  it  is  also  difficult 
to  perform  operations  such  as  union  and  inter- 
section on  regions  represented  by  chain  codes. 

Another  class  of  region  representations  in- 
volves various  types  of  maximal  "blocks"  that  are 
contained  in  a given  region.  For  example,  we  can 
represent  a region  R as  a linked  list  of  the  runs 
(of  pixels)  in  which  R meets  the  successive  rows 
of  the  array.  Here  each  "block"  is  a 1-by-m 
rectangle,  where  m is  the  run  length;  the  runs  are 
the  largest  such  blocks  that  R contains,  and  R is 
determined  by  specifying  the  initial  points  (or 
centers)  and  lengths  of  the  run.  Alternatively, 
we  can  represent  R by  the  set  of  maximal  square 
blocks  (or  blocks  of  any  other  desired  shape)  that 
it  contains;  here  R is  determined  by  specifying 
the  centers  and  radii  of  these  blocks.  This  repre- 
sentation is  called  the  medial  axis  transformation, 
or  HAT.  It  is  somewhat  less  compact  Chan  a chain 
code,  but  it  has  advantages  with  respect  to  per- 
forming union  and  intersection  operations  or 
detecting  properties  such  as  elongatedness  (in 
terms  of  the  smallness  of  the  radii  relative  to 
the  number  of  centers) . 

There  has  been  recent  Interest  in  an  approach 
to  region  representation  based  on  successive  sub- 
division of  the  array  into  quadrants.  If  the 
region  does  not  cover  the  entire  array,  we  subdi- 
vide the  array,  and  repeat  this  process  for  each 
quadrant,  each  aubquadrant , . . . as  long  as  necessary, 
until  we  obtain  blocks  (possibly  single  pixels) 
that  are  entirely  contained  in  the  region  or  en- 
tirely disjoint  from  it.  This  process  can  be 
represented  by  a tree  of  degree  4 (for  brevity:  a 
quadtree)  in  which  the  entire  array  is  the  root 
node,  the  four  sons  of  a node  are  its  quadrants, 
and  the  leaf  nodes  correspond  to  those  blocks  for 
which  no  further  subdivision  is  necessary.  Since 
the  array  was  assumed  to  be  2n-by-2n,  the  tree 
height  is  at  moat  n.  This  method  of  region  repre- 
sentation was  proposed  by  Klinger;  It  has  also 
been  used  for  image  representation  It  la  rela- 
tively compact,  and  is  also  well  suited  to  opera- 
tions such  as  union  and  intersection,  and  to 
detecting  various  region  properties.  . A recent 
Ph.D.  thesis  by  Hunter  in  the  doamln  of  computer 
graphics  develops  a varlaty  of  algorithms  for  the 
manipulation  of  quadtree  region  representations. 
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Those  algorithms,  however,  allow  a node  to  store 
the  list  of  coordinate  points  that  describe  the 
polygon  f row  which  the  quadtree  was  constructed. 

Since  the  quadtree  and  border  representations 
both  have  computational  advantages,  )!t  is  of 
Interest  to  develop  aethods  of  converting  froa 
one  representation  to  the  other.  Altorlthsu  for 
both  of  these  tasks  have  been  develop  >d,  and  are 
described  in  two  reports  (23,24).  Further  work 
Is  in  progress  on  the  use  of  quadtree  representa- 
tions for  such  purposes  as  connected  component 
labeling  and  shape  Batching. 
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Perimeter 

Width 

Observed 

2013 

20.95 

Poisson  line 

2695 

19.07 

Occupancy 

8250 

11.66 

Delaunay 

8055 

12.19 

Table  1.  Observed  and  predicted  veluea  of  total 
perimeter  and  expected  component  width 
for  Figure  lb. 
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Feature 


Type 

Method 

ASN 

CON 

ENT 

COR 

Gray  level 

6 - (0,2) 

B/A.C 

A/B/C 

B/A.C 

A/B/C 

Edge  maxima 

Along  edge 

poor 

poor 

poor 

A/B 

Across  edge 

poor 

poor 

poor 

A/B 

Most  similar 
neighbor  along 
edge 

poor 

poor 

poor 

poor 

Host  similar 
neighbor  across 
edge 

poor 

B/C 

poor 

poor 

Gray  level 
at  and  near 
edge  maxima 

Least  similar 
neighbor  across 
edge 

poor 

B/C 

poor 

poor 

Pair  of  neighbors 
across  edge 

poor 

poor 

poor 

C/A.B 

All  neighbors 
along  edge 

poor 

poor 

poor 

A/B 

All  neighburs 
across  edge 

poor 

poor 

poor 

poor 

T 


Claes  separabilities  for  the  terrain  samples  using  various 
types  of  cooccurrence-based  properties  ( / means  "separat 


Table  2 


Feature 


Method 


Most  similar 
neighbor  along 
edge 


poor  G/R/S/W  R/G,S 


Most  similar 
neighbor  across 
edge 


poor  G/R/S/W  R/S ,W 


Gray  level 
at  and  near 
edge  maxima 


Least  similar 
neighbor  across 
edge 


R/G  C,S/R,W  G/R/S/W  G/R/W 


Pair  of  neighbors 
across  edge 


G,S/R,W  W/G.S  G/R/S/W  R/G.S.W 


All  neighbors 
along  edge 


G/R/W  W/G.S  G/R/S/W  G.S/R.W 


All  neighbors 
across  edge 


G/R  G/R.W  G/R/S/W  G/R,W 


Gray  level 

6 - (0,2) 

W/G.R.S 

G/W/R.S 

G/W/R.S 

G/W/R.S 

Edge  maxima 

Along  edge 

W/R.S 

R/S/G.W 

W/G.R.S 

G/R/S.W 

Across  edge 

poor 

R/W/C.S 

W/G.S 

R/G.S.W 

Segment  Kang*  of 


Probabilities 


Number 

Segment 

NOSE 

RV1NC 

TAIL 

LUING 

1 

4-8 

0.60 

0.24 

0.16 

2 

4-10 

0.52 

0.48 

3 

4-24 

1.00 

4 

4-26 

1.00 

5 

5-8 

0.60 

0.20 

0.20 

6 

5-10 

0.48 

0.52 

7 

5-24 

1.00 

8 

5-26 

1.00 

9 

6-8 

0.60 

0.18 

0.22 

10 

6-10 

0.46 

0.54 

11 

6-24 

1.00 

12 

6-26 

1.00 

13 

8-24 

0.30 

0.70 

14 

8-26 

1.00 

15 

10  - 24 

0.60 

0.38 

0.02 

16 

24  - 30 

1.00 

17 

24  - 32 

0.18 

0.60 

0.22 

18 

26  - 28 

0.60 

0.38 

0.02 

19 

26  - 30 

0.04 

0.96 

20 

26  - 32 

1.00 

21 

28  - 30 

0.60 

0.18 

0.22 

22 

30-8 

1.00 

23 

30  - 10 

1.00 

24 

32-4 

0.46 

0.54 

25 

32-5 

0.42 

0.58 

26 

32-6 

0.40 

0.60 

27 

32  - 18 

1.00 

28 

32  - 10 

1.00 

Table  4.  Segments  and  label  probabilities  for  Figure  7. 


Iteration 

Number 

Total 

Number 

of 

Number 

of  Nodes 

Labels 

at  all 

Nodes 

0 

28 

50 

1 

20 

25 

2 

17 

17 

3 

12 

12 

Tabl 5.  Numbers  of  nodes  and  labels  remaining  at  each 
Iteration. 


Segment 

Number 


Range  of 
Segment 


Final  Label 


1 4-8  NOSE 

2 4-10  NOSE 

5 5 - 8 NOSE 

6 5-10  NOSE 

9 6-8  NOSE 

10  6-10  NOSE 

13  8 - 24  MflNG 

15  10  - 24  RtfINC 

17  24  - 32  TAIL 


24  32  - 4 LWINC 

25  32-3  LVING 

26  32-6  LWINC 


Table  6.  Nodes  surviving  at  the  third  Iteration,  with 
their  unique  labels. 
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Cycle 

Number 

NOSE 

Node 

RUING 

Node 

TAIL 

Node 

LWING 

Node 

Merit 

i 

i 

13 

17 

24 

13.6 

2 

2 

15 

17 

24 

6.4 

3 

5 

13 

17 

25 

14.6 

4 

6 

15 

17 

25 

6.3 

5 

9 

13 

17 

26 

15.1 

6 

10 

15 

17 

26 

6.3 

Table  7.  4-cycles  at  the  third  iteration,  with  their 
(original)  compound  probabilities. 


Figure  1.  Brodatz's  marble  picture  #62:  (a)  original;  (b)  thresholded  at  25. 


Figure  2.  Three  iterations  of  straight  edge  reinforcement  applied  to  an  aerial  photograph  of  an 
airport,  (a)  Original;  (b)  Initial  edge  responses;  (c)  Results  of  iterations  1-3. 


Six  iterations  of  strip  enhancement  applied  to  an  aerial  photograph  of  an  airport 
(a)  Original;  (b)  Initial  edge  responses  and  results  of  iterations  1-6. 


Figure  3 


Tigure  4.  Thresholding  by  relaxation.  The  "light"  probability  of  each  pixel  is  displayed  as 
a grry  level.  The  parts  show  seven  iterations  and  their  corresponding  histograms. 
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Figure  6.  Brodatx  texture  samples  (a)  and  their  edge  maxima  (b) . Top  to  bottom:  sand,  grass, 
wool,  raffia. 
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MIT  PROGRESS  IN  UNDERSTANDING  IMAGES 


Patrick  H.  Winston  and  the  Staff 


The  Artificial  Intelligence  Laboratory 
Massachusetts  Institute  of  Technology 


In  this  strlts  of  image  understanding  conference  proceedings,  we  Marr  and  Hildreth  have  shown  that  provided  some 

have  stressed  the  key  issue  of  representation.  In  particular,  we  weak  conditions  are  satisfied,  intensity  changes  at  a particular 

have  described  the  work  of  Horn  and  his  collaborators  using  the  scale  in  an  image  l(x,y)  are  best  detected  by  locating  the 

reflectance  map  and  the  albedo  image  in  working  with  satellite  lero-ciossings  of  V^G(x,yM(x,y),  where  G(x,y)  is  the  two 

images,  and  we  have  described  the  work  of  Mart  and  his  dimensional  Gaussian  distribution,  and  is  the  Laplacian. 

collaborators  using  the  primal  sketch,  the  2 1/2-D  sketch,  and  The  operator  V^C  uniquely  satisfies  certain  critical  properties  of 

body-centered,  3-D  models  to  work  toward  a comprehensive  theory  localization  in  space  and  frequency.  The  smallest  operator 

of  recognition.  usable  in  practice  has  a diameter  of  9 picture  elements  in  the 

In  the  November,  I97S  Proceedings,  we  reviewed  the  central,  positive  region,  with  an  overall  support  of  roughly  1000 

overall  program,  briefly  explaining  our  approach,  stating  the  pixels.  Interestingly,  this  is  roughly  the  size  of  the  smallest 

objectives,  and  citing  the  fundamental  tools.  Then  we  summarized  channel  found  in  early  human  vision.  The  zero-crossings  are 

the  results  obtained  through  an  enumeration  of  representative  then  represented  by  a set  of  oriented  primitives  called 

individual  efforts.  zero-crossing  segments,  each  describing  a piece  of  the  contour 

Here  we  concentrate  on  Horn's  group's  work  on  hill  whose  intensity  slope  (rate  at  which  the  convolution  changes 

shading  and  atmospheric  modeling  and  on  Mart's  group's  across  the  segment),  and  local  orientation  is  roughly  uniform. 

discoveries  about  texture  and  about  zero-crossings,  with  particular  Small,  closed  contours  are  represented  as  blobs,  also  with  an 

emphasis  on  stereo.  associated  orientation,  average  intensity  slope,  and  site  defined 

by  their  extent  along  a major  and  minor  axis. 

Some  intensity  changes  will  give  rise  to  zero-crossings 
Zero-Crossings  and  the  Primal  Sketch  ov«r  » rang*  of  adjacent  scales,  while  others  may  be  detected 

only  at  a single  scale.  In  combining  information  from  the 
separate  channels,  we  take  advantage  of  the  observation  that 
intensity  changes  in  an  image  arise  from  surface  discontinuities, 
or  from  reflectance  or  illumination  boundaries,  which  all  have 
the  property  that  they  are  spatially  localized.  This  observation 
led  to  the  spatial  coincidence  assumption,  which  states  that  if 
similar  information  concerning  the  presence  of  an  intensity 
change  is  found  across  a set  of  adjacent  channels,  they  most 
likely  describe  the  same  physical  intensity  change,  so  their 
descriptions  may  be  integrated  into  a single  description  of  an 
edge  Information  in  one  channel  which  does  not  coincide  with 
that  from  adjacent  channels  is  assumed  to  arise  from  a physical 
phenomenon  which  can  only  be  measured  at  that  one  scale,  so  it 
gives  rise  to  an  independent  descriptive  element.  The  final  raw 
primal  sketch  contains  a binary  map  specifying  the  position  of 
original  zero-crossing  contours,  together  with  the  symbolic 
description  of  the  intensity  changes,  obtained  from  the  separate 
channels.  Figures  I,  2.  and  3 illustrate  these  -components  of  the 
raw  primal  sketch.  Figure  lb  shows  the  map  of  zero-crossing 
contours  for  the  image  in  Figure  la,  while  figures  2 and  3 show 


The  early  work  in  Marr's  group  was  largely  devoted  to  the 
construction  of  a raw  primal  sketch  of  the  image,  a primitive 
description  of  the  Intensity  changes  in  terms  of  edges,  bars,  blobs, 
and  terminations,  which  are  each  characterized  by  position, 
orientation,  contrast,  and  size.  During  the  last  year,  the  original 
structure  of  the  primal  sketch  computation  has  been  put  aside,  in 
favor  of  a better  theory,  because  of  recent  developments  in  the 
theory  of  early  vision.  These  are  (i)  the  emergence  of 
quantitative  studies  from  psychophysics  of  the  exact  nature  of 
the  channels  involved  in  early  human  vision  (for  example, 
Wilson  and  Bergen  1979],  and  <ii)  Marr  and  Poggio's  theory  of 
stereopats,  which  uses  the  zero-crossings  in  a set  of  second 
directional  derivative  operators  for  the  stereo-  matching  process. 

Computing  the  raw  primal  sketch  falls  naturally  into 
two  parts:  (i)  since  intensity  changes  occur  In  natural  images 
over  a wide  range  of  Kales,  we  first  detect  and  represent  the 
Intensity  changes  at  a set  of  different  scales;  and  (ii)  the 
deKripthms  that  arise  from  these  independent  channels  are  then 
combined  into  a single  primal  sketch  of  the  image. 


symbolic  representations  of  some  of  the  descriptors  attached  to  preserves  the  bandpass  property  while  allowing  the  introduction 
the  locations  marked  in  figure  lb.  Figure  2 illustrates  the  blobs  of  zero  crossings  with  a given  gradient.  Perhaps  the  simplest 

detected  in  the  image,  and  figure  3 the  local  orientations  suitable  local  operator  Is  the  derivative  of  the  two-dimensional 

assigned  to  edge  segments.  These  diagrams  show  only  the  Gaussian: 
spatial  information  contained  in  the  descriptors.  Typical 

examples  of  the  full  descriptors  are:  dg  -(><  +y  ) 

— ■ -2  « i 

(BLOB  (POSITION  146  21) 

(ORIENTATION  105) 

(CONTRAST  76) 

(LENGTH  16) 

(WIDTH  6)) 


This  function  dies  away  rapidly  with  distance  from  the  origin 
and  its  only  zero-crossing  contour  lies  on  the  y-axis  and  has 
maximum  gradient  at  the  origin.  The  Fourier  transform  of  this 
function  has  the  same  form  and  so  dg/dx  has  a half  power 
bandwidth  slightly  greater  than  that  obtained  with  the 
difference  of  Gaussian  (DOC)  convolution.  The  same  holds  for 
translations  and  rotations  of  dg/dx  in  the  x,  y plane,  and  for 
sums  of  these  functions.  Thus  it  is  possible  to  construct  a 
function  which  has  a given  set  of  zero-crossing  contours  and 
gradients  along  those  contours  with  arbitrary  accuracy  with  a 
sum  of  the  form: 


(EDGE  (POSITION  104  23) 
(ORIENTATION  120) 
(CONTRAST  -25) 
(LENGTH  25) 
(WIDTH  4)) 


The  descriptors  to  which  these  correspond  are  marked  with 
arrows.  The  resolution  of  this  analysis  of  the  image  roughly 
corresponds  to  what  a human  would  see  viewing  it  from  a 
distance  of  about  6 feet.  Finally,  there  is  reason  to  believe  that 
this  representation  is  complete,  and  so.  in  principle,  invertible 
[Marr  Poggio  and  Ullman  1978). 


Zero-Crossings  and  Primal-sketch  Inversion 


The  new  stress  on  zero-crossings  raises  the  question  of  how 
complete  is  the  information  provided  by  them.  For  stereo 
analysis,  the  image  is  decomposed  into  a sum  of  approximately 
bandpass  components  (1-2  octaves  wide  at  half-power  points)  and 
zero-crossing  contours  are  obtained  for  each  component.  Marr, 
Poggio,  and  Ullman  (1978)  have  suggested  that  each  component 
may  in  fact  be  determined  up  to  a constant  factor  by  just  its 
zero-crossing  contours.  By  extending  a theorem  by  Logan  [1977] 
for  the  one-dimensional  case,  they  showed  this  to  be  the  case  for 
two-dimensional  entire  functions  with  bandwidth  less  than  an 
octave.  However,  it  is  not  clear  that  this  result  would  apply  even 
approximately  for  practical  signals  which  do  not  satisfy  the  ideal 
'less  than  an  octave  bandpas’*  requirement  even  at  their  half 
power  pom's 

One  way  to  relax  the  bandwidth  restriction  is  to  record 
the  steepness  of  the  surface  (surface  gradient)  through  the 
zero-crossings  as  well.  To  investigate  the  information  content  of 
zero-crossing  contours  and  the  surface  gradient  along  those 
contours  in  practical  situations,  the  problem  of  reconstructing  the 
original  surface  from  that  information  was  studied. 

A direct  approach  to  reconstruction  would  be  to  find 
an  interpolation  operator  which  could  be  used  to  reproduce  the 
boundary  conditions  given.  These  are  the  approximate 
bandpass  characteristic  and  the  zero-crossing  contours  and 
intensity  slopes  Thus  one  wants  an  interpolation  function  that 


For  example  a rough  approximation  to  the  surface  zero-crossing 
and  zero-crossing  gradient  is  obtained  if  N evenly  spaced  points 
(x  j y | ) are  selected  along  the  zero-crossing  contours  and  (a, 
b()  is  the  gradient  (df/dx  df/dy)  of  the  original  surface  at  the 
zero-crossing  point  (x,  y,).  The  fixed  constant  # determines  the 
bandpass  range  of  the  sum.  Since  the  individual  terms  of  the 
sum  above  are  small  everywhere  except  in  the  immediate 
neighborhood  of  <x-(  y t » and  the  terms  themselves  are  spaced 
evenly,  the  gradient  of  the  sum  in  the  vicinity  of  each  point  (x{ 
y |)  is  determined  largely  by  the  ith  term.  This  makes  it 
possible  to  design  a simple  iterative  algorithm  for  adjusting  the 
ith  term  so  that  the  position  and  gradient  of  the  zero-crossing 
achieve  the  desired  value  at  Cw s yj).  It  turns  out  that  a 
two-dimensional  function  which  satisfies  the  zero-crossing 
boundary  conditions,  and  which  has  a similar  spatial  frequency 
bandpass,  can  be  constructed  rather  efficiently  by  this  means. 
Such  functions  can  be  used  to  test  empirically  conditions  under 
which  the  zero-crossing  information  is  sufficient  to  determine 
the  overall  function.  > i 

Figures  4 through  7 show  the  results  of  the 
above  reconstruction  method  applied  to  the  zero-crossing 
gradients  of  an  image  convolved  with  a DOG  mask  with 
central  panel  width  of  12  pixels.  Figure  4 shows  the  positive 
values  of  the  original  128  by  128  convolved  image  as  gray  levels. 
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The  negative  values  and  values  near  zero  appear  as  white. 
The  x’s  indicate  the  zero-crossing  points  used  for  the 
reconstruction  which  were  obtained  by  dividing  the  image  into 
a grid  of  6 by  6 squares  and  selecting  one  zero-crossing  point 
from  each  square  having  a zero-crossing  contour  through  it. 
Figure  5 shows  the  surface  reconstructed  using  just  the  surface 


gradient  at  each  of  the  x's.  Figures  6 and  7 are  overlays  of 
graphs  of  horizontal  slices  through  the  original  surface  in 
figure  4 and  the  reconstruction  in  figure  5. 

Further  work  is  planned  to  explore  the 
possibility  of  using  similar  techniques  to  approximate  a function 
from  just  its  zero-crossing  contours. 
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Figure  4:  Image  convolved  with  a difference  of  Gaussian  mask 
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The  negative  values  and  values  near  lero  appear  as  white. 

The  x's  indicate  the  tero-crossing  points  used  for  the 
reconstruction  which  were  obtained  by  dividing  the  image  into 
a grid  of  6 by  6 squares  and  selecting  one  lero-crossmg  point 
from  each  square  having  a lero-crossmg  contour  through  it. 
Figure  5 shows  the  surface  reconstruct  using  just  the  surface 


gradient  at  each  of  the  x's.  Figures  6 and  7 are  overlays  of 
graphs  of  horizontal  slices  through  the  original  surface  in 
figure  4 and  the  reconstruction  in  figure  5. 

Further  work  is  planned  to  explore  the 
possibility  of  using  similar  techniques  to  approximate  a function 
from  just  its  zero-crossing  contours. 
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Figure  4:  Image  convolved  with  a difference  of  Gaussian  mask. 
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Figure  5:  Image  reconstructed  from  zero-crossings  information. 


— ra  - 


Zero-Crossings  and  Stereo  Theory 

Working  with  Tomaso  Poggio  (of  the  Max  Planck  Institute  for 
Biological  Cybernetics),  Marr  has  developed  a new  theory  of 
stereo  vision.  The  resulting  algorithm  consists  of  five  steps:  (I) 
Each  image  is  filtered  with  bar  masks  of  four  sues  that  vary 
with  eccentricity;  the  equivalent  filters  are  about  one  octave 
wide.  (2)  Zero-crossings  of  the  filter  outputs  are  localized,  and 
the  local  orientation  of  the  zero-crossings  are  computed;  (3)  For 
each  mask  size,  matching  takes  place  between  pairs  of 
zero-crossings  of  the  same  sign  in  the  two  images,  for  a range  of 
disparities  up  to  about  the  width  of  the  mask's  central  region; 

(<)  Wide  masks  are  used  to  do  rough  disparity  calculation,  thus 
causing  small  masks  to  come  into  correspondence;  (5)  When  a 
correspondence  is  achieved,  it  is  written  into  the  2 1/2-D  sketch. 

During  the  past  twelve  months,  an 
implementation  of  the  Marr-Poggio  theory  has  be  tested  on  a 
number  of  images.  Such  testing  has  served  two  purposes;  it 
helps  prove  the  sufficiency  of  the  theory,  and  it  helps  indicate 
problems  or  omissions  in  the  theory  itself.  The  implementation 
was  found  to  be  quite  successful  in  computing  disparity  from  a 
stereo  pair  of  images.  The  images  tested  included  both  natural 
scenes  and  random  dot  patterns,  a critical  test  case  for  any  stereo 
program. 

The  nature  of  the  theory  underlying  the 
stereo  algorithm,  and  in  fact  the  nature  of  the  current  theory  of 
the  primal  sketch,  require  that  th  stereo  algorithm  will  at  best 
only  determine  disparity  values  along  certain  contours  in  the 
image,  the  primal  sketch  descriptors.  A natural  next  step  is  the 
determination  of  disparities  for  any  point  in  the  image. 

The  problem  consists  of  finding  a surface 
which  fits  the  boundary  conditions  imposed  by  the  disparity 
contours  of  the  stereo  algorithm.  In  general,  a large  number  of 
surfaces  will  satisfy  these  conditions.  However,  by  analyzing  the 
process  by  which  the  contours,  upon  which  the  stereo-matching 
process  is  performed,  are  formed,  one  can  determine  constraints 
upon  how  the  surface  may  change  at  positions  in  the  image  not 
corresponding  to  primal  sketch  elements  (i.e.  zero-crossings 
contours).  As  well,  a number  of  experiments  concerning  human 
perception  of  surfaces  In  depth  have  provided  additional 
constraints  on  the  interpolation  or  "filling  in"  process  From 
such  constraints  a number  of  algorithms  have  been  proposed 
for  constructing  surfaces  from  the  stereo  information.  These 
algorithms  are  currently  being  implemented  and  will  be  tested 
in  the  near  future.  It  is  expected  that  such  tests  will  help 
indicate  the  viability  of  such  algorithms,  both  as  potential 
models  for  human  perception  and  as  algorithms  for  the 
construction  of  surface  representations  in  general  image 
processing. 


Texture 

Not  ail  of  Marr's  people  have  been  absorbed  by  the 
zero-crossings  work.  Stevens'  new  thesis  (Stevens  1979]  addresses 
the  representation  of  surface  orientation,  texture  gradients,  and 
surface  contours. 

He  has  found  that  the  two  degrees  of 
freedom  of  surface  orientation  are  usefully  described  relative  to 
the  viewer  in  terms  of  slant  and  till.  Slant  describes  "how 
much,"  and  tilt  describes  "which  way."  In  terms  of  the  local 
surface  normal,  slant  is  measured  by  the  angle  from  the  line  of 
sight  to  the  normal,  and  tilt  is  the  orientation  to  which  the 
normal  would  project  in  the  image.  One  benefit  afforded  by 
this  essentially  polar  form  is  the  decomposition  of  the  problem 
of  determining  surface  orientation  into  two  subproblems, 
determining  slant  and  tilt,  which  are  often  solvable  by  distinct 
and  independent  methods. 

Texture  gradients  are  examined  as  a source 
of  distance  information  and  of  surface  orientation  See  figure  8 
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Figure  8:  Texture  gradients  are  a source  of  distance  and 
surface-orientation  information. 


The  quantitative  measures  of  texture  that  would  be  necessary 
for  either  computation  are  naturally  described  relative  to  the 
local  tilt  frame,  a local  coordinate  system  whose  y-ax-s  is  aligned 
with  the  texture  gradient,  and  therefore  corresponds  to  the 
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surface  tilt.  The  tangent  of  the  slant  angle  is  proportional  to 
the  rate  of  change  of  texture  density,  area,  and  lengths 
measured  relative  to  the  tilt  frame.  But  distance  (up  to  a scale 
factor)  can  be  simply  computed  by  the  reciprocal  of  lengths 
measured  along  the  x-axis,  assuming  that  those  lengths 
correspond  to  uniform  surface  texture  dimensions.  (Texture 
lying  along  the  x-axis  corresponds  to  surface  texture  that  is 
equidistant  from  the  viewer  - those  dimensions  are  not 
foreshortened.)  Stevens  has  observed  that  the  orientation  of  the 
x-axis  corresponds  to  the  image  orientation  in  which  texture  is 
locally  constant,  that  is,  the  tilt  frame  can  be  determined  without 
considering  the  orientation  of  the  gradient.  Hence  a 'depth 
map"  may  be  derived  from  a 'texture  gradient*  without  either 
determining  the  orientation  of  the  gradient  or  its  magnitude 
Surface  shape  is  therefore  more  simply  computed  in  terms  of 
distance  than  local  surface  orientation. 


Figure  9:  Surface  contours  yield  information  about  shape. 


Surface  contours  are  an  important  source  of 
information  about  surface  shape,  however  their  analysis  has 
received  almost  no  theoretical  attention.  See  figure  9.  The  edge 
of  a shadow  cast  on  a surface,  for  example,  may  tell  us  of  the 
shape  of  the  surface,  given  some  assumptions.  The  contours  in 
the  image  of  glossy  surfaces  are  similarly  useful.  But  in  order 
that  a surface  contour  may  tell  us  of  the  shape  of  the  surface, 
various  assumptions  must  be  made.  To  explore  these 
assumptions.  Stevens  has  decomposed  the  problem  into  two 
aspects-  (a)  determining  the  shape  of  the  three-dimensional 


curve  that  Marr  calls  the  contour  generator,  whose  image  is  the 
surface  contour,  and  (b)  determining  how  the  surface  lies  under 
the  contour  generator.  This  is  analogous  to  (a)  bending  a wire 
in  space,  so  that  it  appears  like  the  surface  contour,  and  (b) 
gluing  a ribbon  along  the  wire,  the  ribbon  representing  the 
strip  of  surface  under  the  contour  generator.  The  first  aspect 
may  be  constrained  by  assuming  the  principle  of  general 
position,  planarity,  symmetry,  and  constancy  of  curvature  along 
the  contour  generator.  The  second  aspect  may  be  constrained 
by  assuming  cither  that  the  contour  generator  is  a planar, 
asymptotic  curve  (as  are  gloss  contours  in  orthographic 
projection)  or  a geodesic  curve.  In  either  case,  the  surface 
orientation  can  be  solved  once  the  three-dimensional  shape  of 
the  contour  generator  is  determined. 

Making  Faithful  Synthetic  Images  Requires  Cood 
Atmospheric  Modeling 

. Turning  now  to  Horn’s  work,  recall  that  In  the  last  proceedings, 
we  listed  the  following  uses  for  synthetic  images:  automated 
generation  of  shaded  relief  maps,  generation  of  low-level, 
obliquely-viewed  images,  generation  of  special  maps  that  bring 
out  particular  terrain  features,  classification  of  ground  cover  for 
crop  prediction,  matching  images  to  terrain  data  for  satellite 
navigation,  and  making  maps  for  automatic  or  semiautomatic 
change  detection.  _ 

In  general,  four  factors  must  be  considered 
when  making  synthetic  images  for  these  purposes.  They  are:  (i) 
imaging  geometry  - the  projection  of  the  viewed  scene  onto  the 
image,  (ii)  incident  illumination  - the  intensities  and  distribution 
of  light  sources,  (ill)  surface  photometry  - the  way  a surface 
reflects  light,  and  (iv)  surface  topography  - the  shape  of  things 
in  the  scene. 

Synthetic  images  that  are  to  mimic  obtained 
from  spacecraft  require  attention  to  a fifth  factoi:  the 
atmosphere  attenuates  visual  signals,  scatters  spurious  light  into 
the  viewing  port  of  the  satellite,  and  illuminates  the  ground  as  a 
large,  diffuse  light  source. 

If  effects  of  imaging  geometry,  illumination, 
topography,  and  atmosphere  are  removed,  then  ground  cover 
can  be  identified.  Previous  MIT  research  has  lead  to 
understanding  the  geometry  of  LANDSAT  imaging  (Horn  and 
Woodham  1978],  and  to  success  in  eliminating  topographic 
effects  and  differential  exposure  to  direct  sunlight  [Horn  19781 
The  next  step  is  to  consider  carefully  the  nature  and  extent  of 
the  sky's  influence. 

The  problem  of  atmospheric  effects  is  quite 
complex  and  mathematically  intricate.  The  literature  of  remote 
sensing,  atmospheric  science,  geophysics,  and  space  science  is 
filled  with  detailed  reports  on  absorption,  transmission,  and 
radiance,  complete  tabulations  or  wavelength-dependent 
behavior,  and  sophisticated  models  of  particle  scattering. 
However,  the  emphasis  has  always  been  on  the  degradation  of 
visual  signals  passing  through  the  atmosphere.  The  little 
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research  on  the  contribution  of  sky  luminance  to  the  imaging 
process  has  been  restricted  to  Oat  areas  of  the  earth's  surface, 
either  agricultural  lands  or  the  ocean.  Wood  ham  and  Horn 
[1978]  have  shown  that,  at  least  in  the  shorter  wavelengths,  sky 
illumination  is  not  negligible  in  LANDSAT  multispectral 
scanner  images.  Part  of  (his  is  due  to  (he  relatively  low  sun 
elevation  (about  9:30  a m.  local  time  for  LANDSAT  I and  2) 
which  puts  a large  portion  of  rugged  terrain  in  shadow.  For 
these  areas,  the  sky  is  the  only  light  source  (ignoring  such 
things  as  mutual  illumination  of  one  side  of  a valley  by  the 
opposite  side).  Work  is  underway  to  investigate  the  interaction 
of  sky  radiance  and  surface  reflectance  in  areas  of  rugged 
topography. 

The  research  approach  follows  that  of  prior 
work  in  the  production  of  albedo  maps.  A high-resolution 
digital  terrain  model  represents  the  topography  of  a section  of 
the  earth.  A synthetic  image  [Horn  and  Bachman  1978]  is 
created,  assuming  uniform  Lambertian  surface  reflectance,  a 
point  sun  at  a given  elevation  and  azimuth,  known  LANDSAT 
imaging  geometry,  and  a model  of  sky  luminance,  absorption, 
and  backscatter.  Point-by-point  comparison  with  actual 
LANDSAT  imagery  of  the  corresponding  earth  region  shows 
variations  from  the  assumed  uniform  reflectance,  from  which 
the  intrinsic  reflectance,  or  albedo,  of  each  point  is  calculated.  A 
new  picture  (the  albedo  map)  is  created  from  these  albedos, 
which  is  suitable  for  terrain  classification.  The  problem,  of 
course,  is  to  develop  a sufficiently  accurate  model  of  the 
atmosphere.  The  model  must  be  computationally  feasible,  the 
simpler  the  better.  Therefore,  it  must  use  as  much  local 
information  as  possible,  that  is,  be  applicable  to  individual 
pixels  and  their  neighborhoods.  By  their  very  nature,  however, 
atmospheric  effects  are  global.  Some  theoretical  work  has 
already  been  done  relating  general  illumination  and  surface 
reflectance  [Horn  and  Sjoberg  1979].  Work  is  continuing. 
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1 . Model  Ref inement 

TTl.  Procedural  Description 

One  important  goal  of  the  Rochester 
Vision  Project  is  to  investigate  a 
generalized  form  of  procedural 
invocation  in  which  an  executive 
procedure  chooses  worker  procedures  to 
perform  a job  not  just  on  the  basis  of 
input/output  behavior  (as  traditional 
pattern-  directed  invocation  does),  bu* 
also  taking  into  account  cost/benefit 
estima*es  and  perhaps  other  information 
as  well.  This  scheme  is  motivated  by 
the  desire  to  have  the  advantages  of 
declarative  knowledge  about  what  is 
doable  (the  descriptions)  along  wi*h  the 
advantages  of  procedural  knowledge  about 
how  to  do  it  (the  workers) . The 
declarative,  descriptive  component  will 
allow  conviences  such  as  the  modular 
addition  of  procedural  knowledge.  The 
main  research  issue  is  to  decide  wha* 
exactly  needs  to  be  known  about  worker 
procedures,  and  how  to  express  that  in  a 
useful  and  uniform  manner.  This  must 
also  be  coordinated  with  the  use  of 
relational  constraints  (Russell  and 
Brown,  1978).  The  most  recent,  and 
presently  contemplated  work  at  Rochester 
explores  aspects  of  these  issues  (e.g. 
Lantz,  Ballard,  and  Brown,  1978). 

1.2.  Decision  Theory 

The  use  of  decision  theory  not  only 
as  an  abstract  model  of  intelligent 
perception  but  as  a practical  tool  to 
maximize  computational  benefit/cost  is 
being  investigated  in  the  context  of 
procedural  invocation.  This  work 
continues  in  the  tradition  of  Bolles, 
Sproull,  and  Garvey,  and  ultimately  we 
hope  to  extend  some  of  their  results  to 
deal  with  formal  problems  that  more 
closely  approximate  the  sorts  of  vision 
problems  encountered  in  our  particular 
applications.  Ballard  (see  Section  2) 
uses  decision  theory  techniques  to 
choose  the  most  economical  method 
(assuring  adequate  accuracy)  of  locating 
anatomical  structures  in  large-forma* 
images. 
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2.  Applications  in  Biomedicine 

The  model-directed  finding  of  ribs 
in  chest  radiographs  [Ballard,  1978) 
provides  an  illustration  of  the  use  of 
the  Rochester  Vision  System, 
incorporating  procedure  description, 
utility  measures,  and  tops-down, 
model-directed  perception.  The  object 
here  is  to  cope  with  large  amounts  o. 
possibly  low-quality  data  without  undue 
processing  time  by  depending  on  a 
declarative  model  of  anatomical 
structures,  described  procedural 
knowledge  about  how  to  locate  them,  and 
an  executive  which  uses  decision  theory 
to  control  the  image-  understanding 
process.  A prototype  complete  analysis 
system  is  now  being  developed. 

A novel  and  uniform  method  of 
describing  arbitrary  functions  on  the 
unit  sphere  (which  define  "museum- 
viewable"  volumes)  is  under 
investigation,  with  immediate 
application  to  anatomical  structures 
[Schudy  1978).  The  idea  is  related  to 
the  well-  known  Fourier  descriptions  of 
two-  dimensional  shape.  Volumes  are 
modelled  and  described  as  the  leading 
coefficients  in  certain  spherical 
harmonic  expansions  of  the  volume 
functions.  This  method  also  allows 
least  squared  error  fitting  of  volumes 
in  coefficient  space,  which  interfaces 
nicely  with  routines  which  locate  the 
three-  dimensional  boundaries  of  volumes 
in  image  data. 

3.  Application  in  Aerial  Image  Analysis 

The  three-level  organization  of 
image  analysis  (strategist,  executive, 
worker)  and  a further  exploration  of 
useful  procedural  description  mechanisms 
are  the  objects  of  study  in  automatic 
photo-  interpretation  work  [Lantz  1978). 

The  object  is  to  use  the  sorts  of 
knowledge-  based  inferenclng  used  by 
skilled  photointerpreters,  along  with 
models  inspired  by  photointerpretation 
keys  for  identifying  small  Industries, 
to  do  reliable  and  flexible 
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identification  of  a few  types  of  small 
industrial  installations.  Imagery  has 
been  acquired  from  a Rochester,  N.Y. 
mapping  firm  and  from  RADC  in  Rome,  N.Y. 


systems.  An  Optronics  Colorscan  C-4100 
drum  scanner  is  on  site  and  interfaced 
to  the  Vision  Eclipse.  The  fast  (50KB) 
link  to  the  PDP-KL10  has  been  completed 
and  is  operating  well. 


4.  Image  Encoding  and  Transmission 

4.1  Hierarchical  Image  Encodings 

Communication  of  images,  and 
information  about  images  is  an  important 
part  of  any  image  understanding  project. 
We  have  been  investigating  the  use  of 
various  hierarchical  image  encodings. 

One  of  the  image  transmission  schemes  we 
have  investigated  is  closely  related  to 
"pyramid"  data  structures.  We  have 
demonstrated  that  high  resolution  raster 
images  can  be  effectively  transmitted 
over  relatively  low-bandwidth  lines  by 
sending  a series  of  low  resolution 
approximations,  which  converge  to  the 
final  image  (Sloan  and  Tanimoto,  1978). 

A second  hierarchical  encoding  is 
descrioed  elsewhere  (Ballard,  1979)  in 
these  proceedings.  A Strip  Tree  is  a 
simple  encoding  for  linear  features. 
Efficient  algorithms  have  been  developed 
to  take  advantage  of  the  hierarchy. 

4.2  Composition  and  Re-interpretation 
of  Images 

It  is  often  convenient  to  specify 
an  image  in  terms  of  the  combination  of 
several  existing  images,  rather  than 
transmit  an  entire  new  image.  The 
combination  or  re-interpretation  may 
sometimes  be  performed  with  relatively 
simple  hardware  devices.  We  have 
developed  and  implemented  several  such 
techniques  based  on  the  "video  lookup 
table"  supplied  with  our  Grinnell  GMR-26 
display  (Sloan  and  Brown,  1979).  These 
techniques  are  currently  being  used  to 
overlay  map  features  on  aerial  images, 
display  three-dimensional  surfaces  under 
quickly  varying  lighting  conditions,  and 
show  short,  repetitive  motion  sequences. 

5.  Component  Building 
5.1.  Hardware 

The  Grinnell  GMR-26  display  device 
is  on  site  and  DMA-interfaced  to  an 
Eclipse  computer.  32K  of  core  has  been 
added  to  the  Vision  Eclipse,  which  is 
also  used  for  research  in  distributed 
computing  (see  Section  5.2).  The 
original  60MB  disk  has  been  replaced 
with  a 300NB  one,  and  another  300MB  disk 
has  also  been  installed  along  with  a 
much  faster  controller,  leading  to 
greatly  enhanced  performance.  We  are 
acquiring  terminals  and  investigating 
how  to  meet  our  everyday  computing  needs 
by  commercial,  home-built,  or 
combination  intelligent  terminal 


5.2.  Software 


Advanced  system  software  support  is 
now  used  routinely,  and  more  is  under 
development . Communications  protocols 
and  distributed  computing  packages 
IRovner  1978,  Feldman  1978,  Sheininger 
and  Sabbah  1978,  Selfridge  1978,  Sloan 

1978)  have  been  developed  to  allow  , \ 

access  to  the  GMR-26  through  the  local 
ALTO  computers  or  the  remote  PDP-10,  to 
achieve  reliable  transmission  between 
distributed  processes,  to  produce 
graphics  and  halftone  images  on  ALTO 
screens  from  the  PDP-10,  and  to  allow 
file  transfer  and  telnet  to  the  Arpanet. 
The  IPCF  in  the  TOPS-10  operating  system 
is  the  basis  for  communication  between 
PDP-10  jobs,  and  these  jobs  may  now 
create  RIG  messages  and  send  them  to  the 
local  opera'ing  system  for  disposition. 

At  Rochester,  the  RIG  message  is  the 
lingua  franca  that  allows  processes  on 
remote  machines  to  command  ‘he  GMR-26, 
perform  file  manipulations,  and  other 
operations.  Some  of  our  work  has  been 
utilized  by  other  image  understanding 
groups,  most  extensively  at  SRI.  Some 
student  projects  in  our  Computer  Vision 
courses  are  aimed  at  producing  useful 
system  software  for  vision,  and  the 
common  departmental  interest  in 
distributed  computing  assures  that  new 
and  co-operative  efforts  using  the 
dis‘ribu*.ed  computation  and 
communications  packages  will  be  launched 
frequently.  A comprehensive  library  of 
vision  routines  (Sloan  1977-78)  has  been 
developed,  centralized,  documented,  and 
incorporated  into  the  NEXUS  system. 

They  allow  interactive  users  a wide 
range  of  image-processing  and  display 
(graphics,  halftone,  color  and  B6W  TV) 
capabilities.  A program  to  acquire 
images  from  the  Optronics  scanner  and 
package  them  according  t0  our  Raster 
Image  File  Format  (Selfridge  and  Sloan, 

1979)  has  been  developed  and  is  in 
routine  use. 


Mot  ion  Understanding 


Understanding  motion  pictures  has 
always  presented  an  unusually  difficult 
problem  to  computer  vision  efforts.  The 
compelling  gestalt  induced  in  humans  by 
moving  objec*s  is  no*  well  understood, 
and  so  there  is  little  leverage  on  the 
immediate  problems  resulting  from  the 
large  mass  of  data  in  multi-  frame 
images.  We  are  hoping  to  make  progress 
first  on  a pared-down  version  of  the 
problem  which  nevertheless  offers  an 
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interesting  set  of  perceptual  phenomena 
to  model.  The  domain  is  multi-  frame 
images  of  animal  motion;  initial 
research  is  being  carried  out  on 
sequential  images  of  points  of  light 
attached  to  join’s.  This  data  can  give 
humans  a strong  perception  of  coherent 
motion>  and  present  work  is  aimed  at 
understanding  how  we  correctly  identify 
points  (about  13  in  all  in  present  data) 
from  frame  to  frame,  and  how  we  segment 
the  resulting  moving  points  into 
meaningful  body  parts.  Ultimately,  the 
results  will  be  applied  to  multi-frame 
grey-scale  images.  Data  presently  comes 
from  a program  which  simulates  a range 
of  human  walking  motion  in  3-D.  The 
program  is  a useful  theoretical  tool, 
since  it  allows  direct  access  (not 
mediated  by  vision)  to  movement 
parameters,  and  point  locations.  It  is 
also  a useful  psychological  research 
tool,  since  with  it  one  can 
inexpensively  investigate  limits  in 
human  performance. 

7.  Texture 

Textural  areas  can  be  thought  of  as 
those  parts  of  an  image  where 
segmentation  based  on  normal  similarity 
measures  fails.  Meaningful  analysis  of 
textured  areas  must  include 
discrimination  between  different 
textures  and  detection  of  parts  of  the 
same  texture.  The  similarity  of 
textures  which  are  identical  except  for 
a scale  change,  a rotation,  or  a 
different  range  of  intensities  must  be 
recognized. 

We  approach  the  texture  problem  by 
dividing  texture  regions  into  meaningful 
sub-elements  of  similar  intensity  sample 
points,  then  using  rotation-  and 
scale-invariant  shape  measures  to 
characterize  these  regions  and  finally 
determining  spatial  relationships  among 
our  sub-elements.  By  using  a decision 
tree  program  structure,  easily 
discriminated  textures  are  separated 
quickly,  and  more  complex  textural 
structure  is  extracted  only  when 
necessary  (Maleson,  1978). 

8.  Programming  Language  Development 

The  Smart  Compiler  and  Distributed 
Computation  research  groups  are 
cooperating  on  a language  for  research 
into  both  these  fields  (Ball  1978).  It 
will  contain  the  ideas  of  PLITS, 
together  with  improvements  and 
extensions  gleaned  from  the  SAIL-PLITS 
Implementations  of  ’he  past.  There  are 
several  separate  ways  in  which  the 
programming  language  developments  are 
affecting  Image  Understanding  research 


in  our  laboratory  and  elsewhere  (Feldman 
& Williams  1977;  Feldman,  1978).  Many 
of  the  ideas  developed  in  this  work  are 
being  heavily  used  in  image 
understanding . 
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Introduction 

In  a recent  article,  Marr  I Pogglo  (1979) 
set  out  a computational  theory  of  hunan  stereo 
vision.  According  to  this  theory,  the  huaan 
visual  processor  solves  the  stereoscopic 
Batching  problea  by  aeans  of  an  algorithm  that 
consists  of  five  aain  steps:  (1)  The  left  and 
right  Images  are  each  filtered  at  different 
orientations  with  bar  masks  of  four  sizes  that 
Increase  with  eccentricity;  these  masks  have 
cross-section  that  Is  approximately  the 
difference  of  two  gaussian  functions,  with  space 
constants  in  the  ratio  1: 1.  75.  (2)  Zero- 

crossings  In  the  filtered  Images  are  found, 
along  scan-lines  lying  perpendicular  to  the 
orientation  of  the  mask.  Termination  points  of 
lines  and  edges  are  also  localized.  (3)  For 
each  mask  size,  matching  takes  place  between 
pieces  of  zero-crossing  of  the  sane  sign  and 
roughly  the  sane  orientation  In  the  two  images, 
for  a range  of  disparities  up  to  about  the  width 
of  the  mask's  central  region.  Within  this 
disparity  range,  Marr  4 Pogglo  showed  that  false 
targets  pose  only  a simple  problem.  (4)  The 
output  of  the  wide  masks  can  control  vergence 
movements,  thus  causing  small  masks  to  come  into 
correspondence.  In  this  way,  the  matching 
process  gradually  moves  from  dealing  with  large 
disparities  at  low  resolution  to  dealing  with 
small  disparities  at  high  resolution.  (S)  When 
a correspondence  Is  achieved,  It  Is  stored  in  a 
dynamic  buffer,  called  the  2^-dlmenslonal 
sketch. 

There  are  several  reasons  why  It  Is 
important  to  implement  a computational  theory 
like  this  one.  Firstly,  It  serves  as  a aeans  of 
testing  the  sufficiency  of  the  theory.  That  Is, 
by  running  the  program  on  various  pairs  of 
stereo  Images,  one  can  examine  the  performance 
of  the  program,  and  hence  of  the  theory  Itself, 
provided  the  program  is  an  accurate 
representation  of  that  theory.  Secondly,  such 
an  Implementation  serves  as  a useful  feedback 
device,  enabling  one  to  test  the  critical 
factors  of  the  theory  and  to  illuminate  errors 
or  omissions  of  the  theory.  And  thirdly,  it 
enables  one  to  assess  the  validity  of 
assumptions  made  by  the  theory,  concerning  for 
example  the  statistical  structure  of  images. 

This  article  describes  an  implementation  of 
the  stereo  theory  that  was  written  with 
particular  emphasis  on  the  matching  process.  We 
first  set  out  the  overall  design  of  the  program, 
and  then  we  show  two  examples  and  assess  the 
program's  overall  performance  on  various  types 
of  image. 


Design  of  the  program 

The  Implementation  Is  divided  into  five 
nodules,  roughly  corresponding  to  the  five 
sections  In  the  summary  that  we  gave  above.  We 
describe  each  In  turn. 

I Input  and  oonvolution 

The  input  to  the  program  consists  of  a 
stereo  pair  of  Images,  digitized  on  an  Optronix 
Photo-reader.  The  sizes  of  these  Images  are  320 
X 320,  and  grey-level  resolution  is  8 bits  (256 
grey  levels). 

The  main  difference  in  this  first  stage 
between  our  program  and  the  original  description 
of  the  theory  Is  that  the  Initial  convolutions 
are  performed  with  non-oriented  filters.  This 
is  possible,  because  Marr  4 Hildreth  (1979) 
showed  that  under  rather  weak  conditions,  the 
zero-crossings  obtained  with  the  Laplacian  were 
equivalent  to  those  obtained  with  a set  of 
second  directional  derivatives.  Although  in 
theory,  the  optimal  filter  Is  V*G,  where  V*  Is 
the  Laplacian  and  G Is  a two-dimensional 
Geusslen  distribution  we  have  used  its 
approximation  by  radially  symmetric  differences 
of  Gausslans,  with  space  constants  in  the  ratio 
1: 1.  7S,  (after  Wilson  4 Bergen  1979). 

There  remains  only  one  free  parameter  to 
specify  the  filter  completely— its  overall  size, 
which  Is  conveniently  specified  by  the  width  w 
of  the  filter's  central  excitatory  region. 

Wilson  4 Glese's  (1978)  data  indicated  values  of 
m for  the  central  fovea  of  the  huaan  visual 
system  in  the  range  3’  to  12'  of  visual  arc.  If 
one  considers  the  experimental  conditions  under 
which  Wilson  obtained  his  data,  It  becomes 
apparent  that  these  measurements  essentially 
correspond  to  a projection  of  the  two- 
dimensional  mask  onto  a line.  If  one  projects  a 
radially  symmetric  difference  of  gausslans  (or 
DOG)  filter  onto  a line,  one  obtains  a one- 
dimensional  DOG  with  slightly  smaller  m—  in  fact 

Taking  this  into  account,  and  using  the 
figure  of  20"-30"  for  the  separation  of  cones  In 
the  fovea  (see  Cornsweet  1970  p.  356),  one 
arrives  at  values  of  ar  In  the  range  9 to  35 
pixels.  In  our  program  , three  Initial  filters 
are  used,  with  m values  of  9,  17  and  35  pixels. 
The  supports  of  these  masks  are  roughly  1000, 
3600  and  15200  pixels  respectively. 

The  actual  convolutions  were  carried  out  by 
a LtSP  machine  constructed  at  the  MIT  Artificial 
Intelligence  Laboratory,  using  additional 
hardware  specially  constructed  for  the  purpose. 


I 


f 


42 


F 


I 


I 


II  Detection  and  description  of  zero- 
crossings 

In  theory,  the  elements  that  are  matched 
between  images  are  (1)  zero-crossings  whose 
orientations  are  not  horizontal,  ana  (11) 
terninatlons.  Fro*  the  point  of  view  of  the 
false  target  proble*  it  is  the  zero-crossings 
that  cause  the  real  difficulties,  and  our 
program  uses  only  zero-crossings. 

Because  we  can  ignore  horizontally  oriented 
segments,  the  detection  of  zero-crossings  can  be 
accomplished  by  scanning  the  image  along 
horizontal  lines,  looking  for  either  two 
horizontally  adjacent  pixels  containing 
convolution  values  of  opposite  sign,  or  three 
horizontally  adjacent  pixels,  the  middle  one  of 
which  Is  zero,  and  the  other  two  containing 
convolution  values  of  opposite  sign. 

In  addition  to  their  location,  we  need  the 
signs  of  the  zero-crossings,  and  a rough 
estimate  of  their  local  orientation.  In  the 
present  Implementation,  the  orientation  at  a 
point  on  a zero-crossing  segment  is  computed  as 
follows.  Assign  some  arbitrary  but  consistent 
axis  in  the  image.  Find  the  line  through  the 
point  in  question  which  minimizes  the  total 
separation,  from  the  line  itself,  of  the  points 
of  the  zero-crossing  in  the  neighbourhood  of  the 
point. 

III  Mztohing 

Matching  between  the  left  and  right  Images 
takes  place  between  zero-crossings  obtained  from 
the  same  sized  mask,  having  the  same  sign,  and 
having  roughly  the  same  orientation.  The 
matching  process  begins  with  a rough  disparity 
value  for  the  region,  whose  derivation  we 
describe  below,  and  it  implements  the  second  of 
the  matching  algorithms  described  by  Marr  4 
Poggio  (1979). 

Given  a zero-crossing  in  the  left  image, 
the  search  for  a counterpart  in  the  right  image 
is  centered  on  the  corresponding  location  there, 
shifted  by  the  • priori  disparity  value.  The 
area  to  be  searched  is  divided  into  three  pools, 
two  larger  convergent  and  divergent  regions,  and 
a smaller  one  lying  centrally  between  then. 
Together'  these  pools  span  a disparity  range 
equal  to  2m,  where  w Is  the  (corrected)  width  of 
the  central  excitatory  region  of  the 
corresponding  convolution  mask. 

If  one  zero-crossing  of  the  appropriate 
sign  and  orientation  (within  30*)  is  found 
within  a pool,  the  location  of  that  crossing  Is 
transmitted  to  the  matcher.  If  two  candidate 
zero-crossings  are  found  within  one  pool,  (an 
unlikely  event),  the  matcher  Is  notified  and  no 
attempt  Is  made  to  assign  a match  for  the  point 
In  question.  If  the  matcher  finds  a single 
crossing  in  only  one  of  the  three  pools,  that 
match  Is  accepted,  and  the  disparity  associated 
with  the  match  Is  recorded  in  a buffer.  If  two 
or  three  of  the  pools  contain  a candidate  match, 
the  algorithm  records  that  information  for 
future  disambiguation. 

Once  all  possible  unambiguous  matches  have 
been  made,  the  algorithm  attempts  to  locally 
disambiguate  points  with  double  or  triple 
matches.  This  Is  done  by  searching  a 
neighbourhood  about  the  point  In  question  and 


recording  the  disparity  sign  of  the  matches 
within  that  neighbourhood.  If  the  ambiguous 
point  has  a potential  match  of  the  same 
disparity  sign  as  the  dominant  type  within  the 
neighbourhood,  then  that  is  chosen  as  the  match. 
(This  is  the  "pulling''  effect  of  Julesz  4 Chang 
1970) . Otherwise,  no  match  is  assigned  to  the 
point. 

Two  points  remain  to  be  specified.  The 
first  is  how  the  a priori  disparity  value 
mentioned  at  the  beginning  of  this  section  Is 
obtained.  For  the  largest  mask,  this  rough 
shift  value  is  initially  taken  to  be  zero;  in 
other  words,  the  images  are  assumed  to  be 
aligned.  For  the  smaller  masks,  the  initial 
rough  shift  value  is  found  by  averaging  the 
values  obtained  by  the  previous  mask,  for  the  k 
closest  neighbours  to  the  point.  One  could  also 
use  a histogram  of  local  values  to  obtain  the 
rough  shift  value.  However,  for  areas  which 
contain  pieces  of  two  distinct  surfaces, 
separated  by  a sharp  discontinuity,  the 
histogram  will  be  forced  to  choose  a value 
corresponding  to  disparities  belonging  to  only 
one  of  the  two  surfaces.  If  the  discontinuity 
is  large  enough,  this  may  cause  the  portions  of 
the  second  surface  to  be  shifted  out  of  the 
range  of  the  next  mask,  and  hence  such  points 
will  not  be  assigned  valid  matches.  In  the  case 
of  an  average  of  the  local  values,  this  will  not 
happen.  Instead,  the  average  value  will  lie 
somewhere  between  the  disparities  of  the  two 
surfaces.  Hence,  it  is  possible  that  there  will 
be  candidate  matches  for  both  surfaces  which  are 
within  the  range  of  the  matcher.  In  the  case  of 
areas  which  contain  only  pieces  of  the  same 
surface,  there  is  virtually  no  difference 
between  the  use  of  a histogram  and  the  use  of  a 
local  average. 

Finally,  there  is  the  possibility  that  the 
region  under  consideration  is  not  within  the  2m 
disparity  range.  This  situation  is  detected  and 
handled  by  tesselatlng  the  image,  and  within 
each  tesselatlon  square,  performing  the 
following  operation.  Given  a tentative  shift 
value,  an  attempted  matching  is  performed  for 
all  the  zero-crossings  within  a particular 
tesselatlon.  Any  crossing  for  which  there  1$  no 
match  Is  marked  as  such.  If  the  percentage  of 
unmatched  points  exceeds  a threshold  of  0.3 
(Marr  4 Poggio  1979)  then  the  region  is  declared 
to  be  out  of  range  and  a new  shift  value  Is 
tried.  The  process  continues  until  either  the 
region  comes  into  range  or  all  possible  shift 
values  are  exhausted. 

The  overall  effect  of  the  matching  process, 
as  driven  from  the  left  image,  is  to  assign 
disparity  values  to  most  of  the  zero-crossings 
obtained  from  the  left  Image.  An  example  of  the 
output  appears  in  the  figures.  In  this  array,  a 
zero-crossing  at  position  (x, y)  with  associated 
disparity  d has  been  placed  In  a three- 
dimensional  array  with  coordinate  (x, y,d).  For 
display  purposes,  the  array  is  shown  as  viewed 
from  a point  some  distance  away.  The  heights  in 
the  figure  correspond  to  the  assigned 
disparities. 
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IV  Overall  atruotura  of  the  prooesa 

The  coaplete  algoritha,  «s  we  have 
lapleaented  it,  uses  three  aask  sizes.  To  begin 
with,  the  two  views  of  the  scene  are  convolved 
with  the  largest  aask  and  are  passed  to  the 
algoritha.  The  zero-crossings  and  their 
orientation  are  conputed.  An  initial  horizontal 
and  vertical  registration  of  zero  is  assuaed  for 
the  entire  inage  and  the  aatchlng  process  is 
perforaed  under  this  assuaption.  Any  points 
with  either  aabiguous  aatchings  or  with  no  natch 
are  narked  as  such. 

Any  tesselatlon  square  whose  percentage  of 
unaatched  points  exceeds  a certain  threshold 
a . 'signs  a new  registration  value  for  that  square 
an.;  the  aatchlng  is  perforaed  again.  This 
continues  until  either  all  tesselatlons  are 
aatched  or  all  possible  registrations  have  been 
atteapted.  The  aabiguously  aatched  points  are 
then  considered,  and  the  pulling  effect  is 
applied  to  then.  If  aabiguity  still  reaalns 
after  this  process,  the  point  is  not  assigned  a 
natch. 

The  algoritha  now  passes  to  the  next 
saaller  aask  and  repeats  the  process  of 
obtaining  the  zero-crossings  and  their 
orientation.  For  the  aatchlng  process,  the 
initial  registration  for  each  tesselatlon  is 
siaply  the  average  of  the  disparities  assigned 
in  the  neighbourhood  by  the  previous  aatchlng 
process.  The  rest  of  the  process  proceeds  as 
before. 

The  final  output  is  thus  a sparse  disparity 
nap,  with  disparities  assigned  along  nost 
portions  of  the  zero-crossing  contours  obtained 
froa  the  saallest  aasks. 

'Examples  and  Assessment  of  Performance 

The  lapleaentatlon  of  the  stereo  theory  was 
tested  on  a nuaber  of  iaages.  This  section 
contains  exaaples  of  the  various  stages  of  the 
algoritha,  run  on  several  iaages,  and  an 
assessaent  of  the  perforaance  of  the 
lapleaentatlon. 

A good  tool  for  testing  the  perforaance  of 
the  lapleaentatlon,  and  hence  of  the  theory,  is 
the  randoa  dot  stereograa.  For  exaaple,  a 
randoa  dot  stereograa  consisting  of  a plane 
separated  in  depth  froa  a second  plane  contains 
well  deaarked  disparity  values.  Thus,  the 
perforaance  of  the  lapleaentatlon  can  be  easily 
assessed. 

The  first  pattern  had  a dot  density  of  SO*. 
Each  dot  was  a square  four  pixels  on  a side. 

This  corresponds  to  a dot  of  approxlaately  two 
ainutes  of  visual  arc.  The  total  pattern  was 
256  pixels  on  a side.  The  actual  region  on 
which  aatchlng  took  place  was  150  pixels  on  a 
side.  The  central  plane  of  the  figure  was 
shifted  12  pixels  in  one  iaage  relative  to  the 
other.  In  the  final  disparity  nap  assigned 
after  the  aatchlng  of  the  saallest  channel, 
approxlaately  0. 6*  of  the  points  were 
incorrectly  aatched. 

A siailar  test  was  run  on  a pattern  with  a 
dot  density  of  10*,  and  one  point  (0.  it)  was 
Incorrectly  aatched.  For  a pattern  with  a dot 
density  of  5*,  every  point  was  correctly 
aatched. 


Atscasstoa 

The  figures  exhibit  the  analysis  for  two 
iaages,  a 50t  randoa-dot  stereograa  and  a 
natural  iaage,  in  this  case  of  a sculpture  by 
Henry  Moore.  Although  one  cannot  nake  precise 
aeasureaents  for  natural  situations,  the  prograa 
appears  to  be  perforaing  on  then  at  least  as 
well  as  on  the  aore  difficult  stereograas. 
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1.  A 505.  random  dot  pattern.  The  two  left  columns  indicate  the  convolutions  of 
the  left  and  right  images  vith  masks  o'  site  v = 35,  17  and  9 respectively  from 
top  to  bottom.  The  two  right  columns  indicate  the  zero-crossings  obtained  from 
the  convolutions  in  the  left  most  columns. 
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2.  A 50*  random  dot  pattern.  The  left  and  right  images  are  shown  at  the  top.  The 
three  lower  figures  indicate  an  orthographic  view  of  the  disparity  maps  obtained 
by  matching  the  cero-crossing  descriptions  of  figure  1.  A point  in  the  image  with 
coordinates  (x,  y)  and  *n  assigned  disparity  value  of  d is  portraye'’  in  this  three 
dimensional  system  as  the  point  (xf  y,  d).  Here,  th?  height  of  the  bright  points 
above  the  plane  indicate  the  disparity  values. 
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3.  A natural  image  (a  Henry  Moore  sculpture).  The  two  left  columns  indicate  the 
convolutions  of  the  left  and  right  images.  The  two  right  columns  indicate  zero- 
crossings  obtained  from  the  convolutions. 


47 


4.  A natural  image  (a  Henry  Moore  sculpture), 
at  the  top.  The  lover  three  figures  Indicate 
three  different  channel  sizes. 


The  left  and  right  images  are  shown 
the  disparity  maps  obtained  from  the 
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GEOMETRIC  REASONING  IN  ACRONYM 


Thom ii  O.  IMiN  lid  Rodney  A.  B roots 


Artificial  Intelligence  Laboratory,  Computer  Science  Department 
Stanford  Univenity,  Stanford,  California  MSOS 


Abatract 

ACRONYM  demonttratei  mechanlimi  for  generic 
interpretation  of  images  in  a way  that  Is  generaliiable.  It 
Includes  a high  level  modeling  language  for  natural 
communication  with  the  user  In  terms  of  object  models  and 
observation  models.  It  uses  a rule-based  geometric  reasoning 
system  to  predict  symbolic  appearances  of  objects.  This 
geometric  reasoning  capability  enables  ACRONYM  to  Integrate 
knowledge  and  data  at  different  levels.  ACRONYM  constructs 
a Picture  Graph  using  surface  descriptors  [Arnold]  and  ribbon 
descriptors  (Brooks]  obtained  from  edge  descriptors  (Nevatia 
and  Babul  ACRONYM  has  a powerful  syntactk  matcher.  This 
report  describes  geometric  reasoning  with  generalised  cones  to 
build  the  Observability  Graph  for  aircraft  at  an  airport. 
Increments  to  the  modeling  system  were  necessary.  Initial  efforts 
In  formalizing  the  interpretation  process  are  described  In  a 
separate  paper,  Brooks  describes  a rule-based  region  descriptor 
whkh  demonstrates  high  level  guidance  of  a low  level 
description  process.  ACRONYM  It  In  the  mid  it  of  Its  first  test 
sn  real  data,  aircraft  at  a passenger  terminal. 


Introduction 

ACRONYM  is  Intended  to  demonstrate  mechanisms 
which  provide  key  capabilities  for  Interpreting  scenes  It  It 
Intended  to  function  within  a scenario  of  cartography  and 
photointerpretation.  These  capabilities  Include 

1.  Interpretation  should  be  generk  with  respect  to  objects  and 
generic  with  respect  to  viewing  conditions.  Typkal  scenes 
Include  aircraft,  vehktes  and  buildings.  It  does  not  seem 
reasonable  to  separately  enumerate  and  Identify  all 
configurations  of  buildings  from  all  viewpoints.  Even  If  that 
were  possible,  it  would  leave  the  human  designer  with  the  effort 
of  defining  similarity  classes,  eg.  jet  passenger  aircraft, 
smokestacks,  stairs,  etc.  Defining  similarity  classes  can  be 
automated.  Traditional  graphics  approaches  might  be  adequate 
for  single  object  models  from  single  viewpoints  but  not  for 
generic  Interpretation.  Our  approach  for  interpretation  of 
generic  object  classes  Is  to  use  object  models  constructed  from 
generic  parts,  whkh  are  generalized  cones,  and  to  use  symbolic 
generic  predktions  of  elements  from  these  generk  parts. 

2.  Users  should  be  able  to  specify  tasks  In  a natural  and  simple 
way.  Geometric  models  are  natural  for  both  the  user  and  the 
vision  system.  Ultimately,  users  will  be  able  to  Instruct  systems  In 
natural  language.  The  representation  hierarchy  of  ACRONYM 
could  serve  as  a bridge  between  natural  language  and  standard 


programming  languages. 

S.  The  system  should  integrate  information  whkh  comes  In 
different  levels  and  different  forms.  A photointerpreter  solves  a 
puzzle  by  piecing  together  selected  and  multiple  duet  from 
current  Images  (Image  level),  background  Information  (objKt 
level)  and  previous  Images.  In  doing  so,  he  relies  heavily  on 
spatial  Interpretation  from  stereo  Imaging  and  shadows  (surface 
level),  and  spatial  knowledge  about  structures  (volume  level). 
Integrating  multiple  cues  within  a single  task  raises  technical 
requirements  for  representation. 

4.  Systems  for  very  different  tasks  should  be  constructed  from  a 
large  core  of  common  modules  and  a small  set  of  task-spedfk 
modules.  For  a single  system  to  map  this  wide  range  of  task 
elements  onto  a common  set  of  modules,  it  Is  convenient  that  the 
modules  represent  a natural  decomposition  of  the  problem  Into 
physkally  meaningful  elements,  tor  example,  those  see  use  In  our 
own  description  of  the  problem.  Our  basis  for  generalltablllty  Is 
the  use  of  a tightly  structured  hierarchy  of  geometric 
representations.  A photointerpreter  performs  a wide  range  of 
tasks  which  have  widely  different  collateral  information  at  the 
contextual  level;  tasks  deal  with  widely  varying  objects;  they 
vary  greatly  at  the  image  level  because  of  varied  viewpoint, 
Illumination,  sensors,  weather,  and  obscuration  and  camouflage: 

Consider  some  typical  tasks  for  ACRONYM: 

1.  A photointerpreter  monitors  an  airfield.  The  system  Identifies 
and  counts  aircraft  at  frequent  Intervals  to  monitor  air  traffk 

2.  An  Interpreter  monitors  a building  complex  for  changes.  The 
system  uses  stereo,  a model  of  the  complex,  and  Identification  to 
distinguish  Inslgnlfkant  from  slgnlfkant  changes  The  system 
might  not  notify  the  Interpreter  about  changes  from  snow  or 
rain,  or  moving  a vehkle,  but  notify  him  about  building 
additions. 

5.  An  Interpreter  monitors  vehklet  in  staging  areas.  The  system 
Identifies  vehklet  and  monitors  traffk  to  and  from  the  area. 

Figure  I shows  the  data  structures  and  data  flow,  and  program 
modules  of  the  ACRONYM  system.  Data  structures  arc  enclosed 
In  the  Inner  box,  while  program  modules  surround  the  Inner 
box. 


sections  were  normal  to  the  spine.  This  generaksation  enables 
representing  a larger  class  of  objects  and  Is  natural  for  aircraft 
wings,  for  example.  S The  sweeping  rule  mutt  be  piecewise 
linear,  and  continuous.  The  spine  must  be  continuous  and  made 
up  of  straight  line  segments.  Circular  arcs  can  also  be  used  as 
segments  of  the  spine,  to  long  as  the  sweeping  rule  is  constant, 
the  cross  section  has  a piecewise  linear  boundary  and  Is  kept 
normal  to  the  spine.  The  generalization  here  over  our  previous 
system  Is  In  allowing  piecewise  linear  sweeping  rules,  Instead  of 
the  former  linear  sweeping  rules,  and  allowing  a segmented 
spine  The  class  of  shapes  which  Is  now  representable  Is  no 
bigger  than  before,  as  these  more  complex  cones  could  be  bulk 
as  structured  objects.  The  advantages  are  In  terms  of  now  being 
able  to  refer  to  these  shapes  as  primitive  volume  elements, 
where  the  Intent  of  the  spatial  relations  between  object  subparts 
no  longer  need  be  deduced. 

The  other  change  we  have  made  to  the  representation  was 
more  major.  A previous  modeling  system  based  on  generakaed 
cones  Miyamoto  and  Blnford  (I975J  used  a level  of  detail  or 
subpart  hierarchy.  It  tied  the  spatial  relation  hierarchy  to  this 
same  tree  structure.  The  original  version  of  the  ACRONYM 
modeling  system  did  likewise.  Often  however,  there  Is  no  real 
Justification  for  this  correspondence  and  models  must  be 
strangely  contorted  to  accommodate  It.  At  a simple  example, 
consider  that  In  modeling  an  aircraft  K Is  natural  to  affix  the 
two  wings  to  the  fuselage,  so  that  their  position  In  space  Is 
described  In  terms  of  the  position  and  orientation  of  the 
fuselage.  However  It  It  not  natural  to  describe  an  aircraft  at 
some  level  of  description  as  merely  a fuselage,  and  Include  wings 
only  at  a refined  level.  In  fact  often  the  wings  will  appear  the 
same  site  as  the  fuselage.  By  separating  the  two  hierarchies  we 
are  able  to  retain  the  spatial  relation  organisation  described 
above,  while  having  a lop  level  decomposition  of  the  aircraft 
Into  the  fuselage  and  the  two  wings. 

Work  has  begun  on  a geometric  editor  for  constructing  object 
models.  Initially  It  will  provide  some  of  the  capabilities  of 
CEOMED  [Baumgart]  for  convenient  Interaction,  combined 
with  track  ball  and  other  analog  Input  devices.  Beyond  these 
Initial  stages,  we  plan  to  include  geometric  reasoning  capabilities 
discussed  below  to  provide  the  ability  to  understand  Implied 
relations  and  to  draw  Inferences  from  examples. 
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Reasoning  about  Geometry 

The  Predictor  and  Planner  Is  a critical  module  of  the 
ACRONYM  system.  ACRONYM  Is  the  first  vision  system  to 
incorporate  a general  reasoning  system.  It  Is  necessary  because 
we  wish  to  make  symbolic  predictions  of  the  appearance  of 
generic  objects,  from  generic  viewpoints  We  have  previously 
(Brooks.  Greiner  and  Blnford  [1978b])  discussed  the  structure  of 
the  Observability  Graph.  Its  automatic  production  Is  a dtfflcuk 
task  which  requires  reasoning  about  a large  body  of  diverse 
knowledge.  We  have  chosen  to  use  a rule-based  system  to 
facilitate  experimentation,  and  to  provide  additivity  of  new 
knowledge.  We  will  first  describe  the  control  structure  we  have 
Implemented  for  our  rules,  then  discuss  some  of  the  techniques 
used  In  our  early  sets  of  rules  to  product  the  Observability 
Graph. 


Geometric  Modeling 

In  previous  papers  (Brooks,  Greiner  and  Blnford  11978a, 
1978b])  we  have  given  detailed  explanations  of  the  geometric 
models  used  by  the  ACRONYM  system.  Recently  see  have  made 
two  major  generalisations  to  the  representations  we  use 
Geometric  modeling  supports  a broader  subclass  of  generalised 
cones,  and  the  subpart  and  spatial  relation  hierarchies  have 
been  separated  Into  distinct  and  Independent  structures. 

The  subclass  of  generakaed  cones  we  now  allow  can  be 
summarised  as  follows:  I.  The  cross  sections  mu*  have  a 
boundary  which  can  be  decomposed  Into  straight  Nne  segments 
and  circular  arcs.  2.  The  cross  section  can  be  kept  at  any 
constant  angle  while  being  swept  along  the  spine  Formerly,  cross 


INTERPRETATION  | 

MATCHER 

GRAPH  «-♦ 

l 

PREOICTOR 

1 

OBSERVABILITY 

AND  PLANNER 

1 

GRAPH 

The  baste  control  strategy  we  have  chosen  Is  to  have  our 
rules  consequent  driven  (l.e  goal-directed,  or  backward  chained). 
The  rutVa  can  be  Interpreted  to  mean  that  the  consequents 
should  be  asserted  If  the  conjunction  of  the  antecedents  can  be 
pruned  It  is  thus  easy  to  write  rules  which  traverse  either  the 
affix menl  structure,  or  subpart  structure  In  a depth  first 
manner  This  Is  the  search  order  which  provides  the  natural 
way  to  decompose  an  object  - l.e.  process  the  subparts  first,  then 
determine  the  relationships  between  those  subparts  to  describe 
the  object.  The  same  Is  true  of  afflxment  structures  The  depth 
first  nature  of  the  Object  Graph  traversal  means  that 
Inadequacies  In  some  given  area  of  the  Object  Graph  will  be 
tend  to  be  noticed  around  the  same  time  and  thus  any  questions 
to  the  user  will  tend  to  be  naturally  grouped  Into  coherent  topics 
of  Interest  <see  Davis  [1916))  This  simple  strategy  Is  not  quite 
sufficient  for  the  task  at  hand,  however.  In  our  current  system  It 
Is  possible  that  once  Invoked  and  "fired"  a rule  may  take  control 
for  a while,  doing  some  forward  reasoning,  and  possibly  setting 
up  new  goals  which  are  attempted  before  control  Is  handed  back 
to  the  original  Invoking  mechanism.  However,  depth-first  Is  an 
unnatural  way  for  strategies  for  perception.  The  subpart 
hierarchy  provides  an  approximate  ordering  for  perceptual 
search  For  example,  to  find  an  aircraft,  find  Its  fuselage  and 
wings  first,  then  rind  Its  stabilisers  and  engine  pods. 

Davis,  Buchanan  and  Shortllffe  [1975]  use  consequent  driven 
rules  In  the  MYCIN  medical  diagnosis  system.  Our  first 
implementation  of  the  rule-based  predictor  and  planner  closely 
followed  the  MYCIN  model  (Brooks,  Greiner  and  Blnford 
[ 1978b I).  However,  we  are  working  with  more  complex, 
structured  data,  and  have  gradually  moved  to  a rather  different 
system. 

MYCIN-llke  systems  use  a single,  uniformly  structured 
data  representation  to  hold  both  the  original  facts  and  the 
assertions  made  during  the  reasoning  process.  Rules  both  read 
and  write  Into  this  single  collection  of  object-attribute-value 
triples.  Our  current  system  has  three  data  sites.  The 
Object/Context  Graph  (including  extra  Interaction  with  the 
user)  Is  a read-only  site  from  which  originate  most  of  the  facts 
used  In  a deduction.  The  Observability  Graph  Is  an  essentially 
wrlte-only  site.  Thirdly,  there  Is  an  assertion  space  (or  short  term 
memory  - STM).  This  consists  of  triples  of  a descriptor,  a 
context  and  a value  and  Is  both  a read  and  write  site.  Assertions 
In  the  short  term  memory  can  be  thought  of  as  defining  the 
value  of  a function  (the  descriptor)  on  some  element  of  its 
domain  (the  context).  The  descriptor  Is  an  S-expression  - 
currently  these  are  used  purely  syntactically  via  tests  for  equality, 
but  later  semantic  information  may  be  explicitly  attached;  the 
context  1$  a structured  fragment  of  the  Object  graph.  Many 
descriptors  are  treated  as  predicates,  by  the  rules,  and  so  only 
have  values  true  or  false.  Production  of  the  Observability 
Graph  Is  the  reason  for  existence  of  the  Predictor  and  Planner. 
However  Internally,  the  Observability  Graph  Is  produced  only 
as  a side  effect  of  manipulations  of  assertions  In  STM.  The 
reasoning  system  can  be  viewed  as  trying  to  prove  some  theorem 
In  the  assertion  space,  by  using  rules  of  deduction  (described 
below)  For  Instance  the  top  level  goal  for  producing  the 
Observability  Graph  for  a model  called  AIRPORT  would  be  to 
achieve  the  triple  <(OBSERV  GRAPH),  AIRPORT.  T>  In 
STM.  In  finding  a proof  of  the  desired  theorem,  these  rules  are 
Invoked,  and  their  side  effects  build  a correct  Observability 
Graph.  Thus  every  valid  proof  of  a theorem,  defines  some 
structure  for  the  Observability  Graph. 


Our  rules  have  three  components;  premises  (these  are  the 
antecedents),  side  effects  and  consequents  (possibly  multiple 
consequents  for  a single  rule).  Rules  are  Indexed  on  their 
consequents.  The  backward  chaining  strategy  finds  all  rules 
which  might  possibly  satisfy  the  current  goal.  They  are  tried  In 
turn  until  one  succeeds.  A rule  is  tried  by  recursively  trying  to 
satisfy  its  premises,  until  either  one  falls  (whence  the  rule  Is 
discarded  and  the  next  rule  tried),  or  until  all  premises  have 
been  satisfied.  In  this  case  the  side  effects  are  carried  out,  and 
then  the  consequents  are  asserted  - satisfying  the  original  goal. 
Thus  the  control  of  the  reasoning  system  Is  via  the  assertions 
written  Into  the  STM. 

Fig.  2 shows  the  flow  of  Information  from  the  Object 
Graph  to  the  Observability  Graph.  The  assertion  of  the 
consequents  places  them  into  Short  Term  Memory  (STM).  The 
antecedents  use  stylized  accesses  of  the  Object  graph  or  STM.  In 
general  an  antecedent  simply  applies  a relational  operator  to  the 
result  of  an  access  and  some  other  value  - either  a fixed  value 
within  the  rule,  or  the  result  of  some  other  access.  Each 
antecedent  returns  true  or  false.  There  Is  an  Implicit  conjunction 
of  the  antecedants.  Consequents  are  stylised  write  Instructions, 
which  write  Into  short  term  memory.  They  can  get  values  from 
the  side  effect  stage  of  a rule  via  a push  down  stack  reserved 
for  that  purpose.  This  Is  a generalisation  of  the  technique  used 
In  MYCIN  of  passing  values  from  antecedants  to  consequents 
via  global  variables  (see  Davis,  Buchanan  and  Shortllffe 
[1975]).  The  side  effects  of  a rule  are  expressed  as  a general 
piece  of  LISP  code.  The  highly  stylized  nature  of  the 
antecedants  and  consequents  should  ease  the  Introduction  of 
meta-rules  (Davis  [1976]),  which  Is  tentatively  planned  for  the 
future.  It  is  too  restrictive,  for  the  difficulty  of  the  task  involved, 
to  so  stylize  the  side  effects.  Thus  the  meta  rules  will  most  likely 
have  to  be  confined  to  examine  Just  the  an'.ecendants  and 
consequents.  This  should  cause  no  real  problems  however, 
because  the  control  and  goal  structure  of  the  reasoning  system  Is 
embedded  in  precisely  these  parts  of  the  rules. 


Fig.  S.  Producing  the  Observablky  Graph 
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In  producing  the  Observability  Graph  the  Predictor  and 
Planner  works  with  observables  as  defined  from  characterltlng 
low  level  feature  description  operators.  It  is  guided  by  two 
classes  of  quasl-Invarlants.  Object  class  quasi-invariants  broadly 
determine  the  node  structure  of  the  Observability  Graph. 
Viewing  condition  quasl-Invarlants  determine  the  detailed 
contents  of  nodes,  and  the  arcs  and  relations  between  nodes.  Of 
course  this  distinction  Is  sometimes  blurred.  For  Instance  a 
mixture  of  the  two  types  of  quasl-Invarlants  directs  production 
of  the  arc  which  says  that  runways  should  Intersect  a taxiway  in 
the  Image.  The  object  model  says  that  they  should  be  connected, 
and  a rule  which  knows  that  connectivity  Is  invariant  under  all 
viewing  angles  deduces  that  Intersection  of  the  regions 
corresponding  to  the  nodes  will  be  observable. 

At  the  time  of  writing  we  have  developed  some  <2  rules, 
for  use  In  the  predictor  and  planner.  These  can  be  roughly 
categorized  Into  classes  of  rules  that  transfer  values  from  the 
Object  Graph  to  STM,  rules  that  build  Observability  Graph 
node  structure,  rules  for  deducing  simple  shapes,  rules  for 
symbolic  arithmetic,  rules  to  calculate  relative  positions  of 
objects,  and  rules  about  the  camera  model.  Notice  that  this 
categorization  Is  In  terms  of  both  the  actions,  and  the  side 
effects,  but  that  the  control  mechanisms  see  the  rules  only  In 
terms  of  their  actions. 

The  rules  which  transfer  values  from  the  Object  graph  to 
STM  perform  two  tasks.  Firstly  they  can  greatly  improve 
efficiency  and  rule  readability,  as  accessing  data  In  the  Object 
graph  often  requires  following  a long  list  of  pointers.  Accessing 
the  STM  Is  done  by  an  associative  hashing  mechanism.  Thus 
having  rules  refer  to  STM  results  in  both  efficient  retrieval 
(with  a single  complex  original  retrieval  from  the  Object  graph), 
and  a clear  and  easy  syntax  within  the  rule.  Many  of  these 
simple  lookup  rules  also  carry  out  simple  summarizing 
operations.  There  are  often  many  ways  to  represent  the  same 
geometric  shape  In  the  object  graph.  This  multiplicity  Is 
motivated  by  a desire  to  provide  a user  with  a natural  and 
intuitive  modeling  system.  Some  of  the  lookup  rules  map  these 
representations  Into  named  classes,  which  other  rules  know  how 
to  handle.  For  instance  both  a square  cross  section  and  a 
rectangular  cross  section  with  equal  width  and  height  might  get 
mapped  into  a single  class  SQUARE  in  STM  (and  other  classes 
too,  as  there  are  no  uniqueness  constraints).  The  classes  are 
represented  In  STM  as  descriptors,  and  something  Is  a member 
of  the  class  only  If  It  appears  as  the  context  of  a triple  with  that 
descriptor,  with  value  T.  The  Intent  of  the  classes  Is  to 
summarize  the  situations  of  appllcabltty  for  ocher  rules. 

Observability  Graph  nodes  can  either  describe  observable 
shapes,  or  can  be  recursively  complete  Observability  Craphs. 
For  Instance  for  an  airport  Observability  Graph  there  will  be 
nodes  which  describe  the  shape  of  runways  and  taxiways,  and  a 
node  which  1s  the  Observability  Graph  for  aircraft.  Arcs 
between  these  nodes  will  describe  their  spatial  relations, 
including  such  facts  as  runways  are  connected  to  taxiways,  and 
aircraft  can  be  found  on  runways  and  taxiways.  The  class  of 
rules  being  considered  here  decides  what  type  of  nodes  should 
be  generated,  and  their  premises  cause  goals  to  be  set  up  to 
calculate  all  the  relevant  details  to  fin  In  the  node  descriptions. 
Since  nodes  can  be  quantified  In  a variety  of  ways  (eg.  there  are 
from  one  to  three  runways  in  an  airport)  there  are  rules  to 
handle  these  quantifications 


So  far  the  rules  we  have  written  for  shape  description, 
deal  only  with  very  simple  three  dimensional  generalized  cones, 
flat  straight  ribbons  (such  as  runways)  and  right  circular 
cylinders.  Rules  for  a much  larger  class  of  shapes  will  be  easily 
written  now  that  we  have  the  developed  a framework,  without 
any  new  conceptual  difficulties.  Eventually  as  we  tackle  very 
complex  shapes,  we  will  have  to  develop  tome  new  tools. 
Currently  the  rules  take  into  account  any  camera  model  which  Is 
known  (there  are  different  rules  for  different  levels  of 
knowledge)  and  predict  the  shape  of  a ribbon  in  the  image.  To 
do  this  requires  setting  up  subgoals  to  deduce  the  three 
dimensional  location  of  the  shape  In  the  model,  and  often 
arithmetic  subgoafs.  The  latter  occurs  for  Instance  In  calculating 
the  apparent  width  of  a ribbon  arising  from  a right  circular 
cylinder.  The  radius  of  the  model  cylinder  has  to  be  multiplied 
by  two.  This  happens  In  the  side  effects,  where  the  node  In  the 
Observability  Graph  It  being  constructed.  The  problem  Is  that 
the  radius  may  not  be  known  exactly,  but  rather  only  as  some 
description  • e g.  a range  descriptor,  or  perhaps  a histogram  of 
distribution  of  expected  radii.  The  side  effects  of  the  particular 
rule  which  deduces  the  shape  of  the  right  circular  cylinders  set 
up  a goal  of  multiplying  the  radius  by  2.  The  backward 
chaining  mechanism  is  invoked  recursively  on  this  subgoal. 
There  are  rules  which  know  how  to  multiply  different 
representations  of  quantities.  Currently  then  Is  a rule  which  can 
multiply  simple  numbers  (by  Invoking  the  LISP  multiply 
function  In  Its  side  effects)  and  a rule  which  can  multiply  an 
Interval  by  a number.  The  premises  of  these  rules  control  which 
one  will  actually  get  Invoked.  The  previous  example  here,  is  a 
good  example  of  the  additivity  of  knowledge  that  the  rule 
system  gives.  As  we  Include  more  representations  for  quantities, 
we  will  not  need  to  go  and  change  all  the  rules  which  deal  with 
quantities,  such  as  the  one  mentioned  above,  but  merely  add 
new  rules  which  can  carry  out  the  primitive  operations,  such  as 
multiplication,  which  we  wish  to  carry  out  upon  quantities. 

One  way  to  reason  about  spatial  relations  is  via  matrices 
of  numbers,  and  matrix  arithmetic.  The  graphics  generator 
module  of  ACRONYM  takes  this  approach.  It  is  successful  in 
carrying  out  the  tasks  desired  of  it  because  it  has  embedded  in  It 
general  numerical  formulas  which  hold  over  the  whole  range  of 
spatial  orientations.  In  the  predictor  and  planner  however  we 
are  dealing  with  symbolic  descriptions  af  shapes.  Often  It  Is 
possible  to  make  use  of  special  orientations,  and  make 
deductions  which  hold  only  for  special  cases,  rather  than  the 
general  case.  Thus  there  are  rules  which  deduce  spatial  relations 
between  special  named  cases,  such  as  the  ground  plane,  a 
vertical  plane,  the  class  of  horizontal  planes,  etc.  By  carrying  out 
spatial  manipulations  at  this  level,  explicit  knowledge  about 
orientations  and  positions  can  be  preserved  across  transforms.  A 
strictly  numerical  approach  would  result  In  this  explicit 
knowledge  being  lost  In  an  array  of  numbers,  or  at  best,  tt  might 
have  to  be  re-extracted  by  routines  which  could  recognize 
special  cases  of  numeric  arrays. 
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Interpretation 

The  Interpreter  used  thus  far  ts  a state  of  the  irt  graph 
matcher  which  was  described  In  previous  reports  The 
Interpreter  matches  the  Observability  Graph  to  the  Picture 
Graph.  Semantics  of  the  problem  domain  are  encoded  in  the 
Observability  Graph.  Ordering  of  the  matching  process  for 
efficiency  Is  accomplished  with  the  Observability  Graph,  also. 

There  are  several  ways  that  efficient  matching  Is  carried 
out.  First,  the  matching  process  determined  by  the  Observability 
Graph  Is  tailored  to  Individual  problems  An  Observability 
Graph  ts  made  for  the  particular  objects  and  viewing  conditions 
of  each  problem.  There  Is  no  fixed  set  of  operations  which  are 
performed  whether  relevant  or  not.  Second,  the  Observability 
Graph  encodes  a coarse  to  fine  matching  For  aircraft,  this 
means  that  first  wings  and  fuselage  are  matched,  then  smaller 
detail  like  engines  and  horizontal  stabilizers.  These  levels  of 
matching  correspond  to  levels  of  detail  In  the  Observability 
Graph.  The  structure  of  the  Observability  Graph  reflects  levels 
of  detail  In  the  Object  Graph,  the  main  parts  of  which  are 
fuselage  and  wings.  However,  Observability  Graph  structure 
depends  on  knowledge  about  the  process  of  observation.  For 
example,  the  smooth  Intensity  gradient  along  the  curved  edge  of 
the  fuselage  may  cause  problems  for  most  edge  operators.  In 
SAR  Images,  wings  and  fuselage  may  not  show  up.  The 
matching  process  Implements  this  coarse  to  fine  matching 
structure.  We  plan  to  soon  implement  an  improvement  to 
explicitly  use  the  fuselage  locations  when  searching  for  the 
wings,  and  subsequently  use  these  locations  to  constrain  the 
search  space  for  the  stabilizers  and  engine  pods.  We  Intend  to 
Incorporate  efficient  scheduling  In  a system  of  rules. 

We  are  working  to  formalize  the  Interpretation.  In 
previous  work,  matching  has  been  represented  as  graph 
embedding  (Barrow)  That  is,  consider  a graph  description  of 
an  expected  image  of  the  object,  C;  then,  matching  seeks  an 
isomorphism  of  G with  a subgraph  of  the  feature  graph  of  the 
picture,  what  we  call  the  Picture  Graph.  This  representation 
leads  to  efficient  algorithms  for  matching,  but  there  are  several 
problems  with  this  approach  The  first  Is  that  feature  descriptor 
algorithms  have  serious  limitations;  e.g.edge  finders  miss  edges 
The  second  Is  more  fundamental:  objects  may  be  camouflaged, 
or  have  snow  or  rain  on  their  surfaces.  Even  more  fundamental 
problems  arise  when  we  seek  generic  object  classes. 

One  approach  to  dealing  with  anticipated  differences  is  to 
define  a distance  metric.  This  approach  adds  very  little 
capability  to  the  initial  matching  technique,  although  it  adds 
considerable  complication  In  conceptualization  and  computation. 
Distance  thresholds  large  enough  to  accomodate  markings  and 
configuration  differences  offer  little  discrimination. 

One  approach  to  formalization  will  be  presented  In  a 
forthcoming  memo  Another  approach  to  matching  Is 
generalized  Invariance,  described  here  by  analogy  with 
generalized  translational  invariance  Generalized  translational 
Invariance  Is  the  principle  underlying  generalized  cones. 
Consider  two  cross  sections.  The  guiding  principle  was  not 
equivalence  of  two  cross  sections  within  a distance  measure 
under  a suitable  translation,  but  instead  congruence  of  the  taro 
cross  sections  under  a translation  and  normalizing 
transformation. 


We  represent  the  matching  process  as 
T*0  - hP, 

that  Is,  an  isomorphism  of  a subgraph  of  volumes  obtained 
from  the  Object  Graph  with  a subgraph  of  volumes  lifted  from 
the  Picture  Graph.  Let  us  explain.  O Is  Che  Object  Graph  and 
P is  the  Picture  Graph.  O may  be  a generic  or  specific  object, 
that  Is  it  might  represent  the  class  of  jet  passenger  aircraft  or  a 
747.  T Is  a transformation  from  the  3d  Object  Craph  to  a 
closely  related  3d  model.  I Is  a 3d  Interpretation  operation  which 
maps  from  2d  image  features,  ribbons  or  edges,  to  3d  volumes, 
generalized  cones.  I also  maps  from  surface  features  obtained 
from  stereo  to  3d  volumes 

For  example,  consider  the  class  of  coffee  cups.  They 
Include  styrofoam  conical  cups  and  handmade  |>ottery 
specimens.  They  differ  greatly  In  shape,  volume  and  trim;  some 
have  handles,  others  not;  some  are  cylinders,  others  distinctly 
conical;  A generic  description  for  this  class  is  a functional 
description,  from  the  beginning  of  our  work  with  generalized 
cones,  our  guiding  principle  for  dealing  with  objects  has  been 
form  - function.  Frelling  demonstrated  an  Interesting  program 
based  on  this  principle  (unpublished)  Concretely,  that  means 
that  If  we  can  describe  the  function,  we  can  describe  the 
relevant  form,  and  use  shape  to  represent  classes  with  enormous 
variation.  In  this  example,  a coffee  cup  is  a container  for  fluids 
which  is  to  be  held  in  a human  hand.  This  gives  a generic 
specification  of  the  class  of  cups:  it  is  a container,  hence  enclosed 
except  vertically.  From  a little  more  knowledge  we  Infer  It  Is 
open  at  the  top  and  flat  on  the  bottom,  and  something  about  the 
mate,  ial  it  is  made  from.  We  infer  Its  approximate  volume  and 
Its  approximate  diameter  from  human  measurements  of  thirst 
and  hand  size.  From  a knowledge  of  fabrication  methods  we 
infer  Its  usual  circular  cross  section,  and  flat  top.  In  this 
discussion,  O is  the  class  of  coffee  cups,  and  T Is  the 
transformation  from  O to  the  generic  description  we  arrived  at. 
a hollow  circular  cone  with  diameter  approximately  specified, 
flat  ends,  closed  bottom,  open  top.  vertical  orientation,  made  of 
one  of  a few  materials  We  f -heve  that  this  Inference  capability 
provides  powerful  generality  at  the  expense  of  a moderate 
knowledge  base.  ACRONYM  does  not  have  this  capability  now. 

A measure  Is  required  to  compare  alternative 
interpretations  There  are  many  examples  In  which  humans 
Ignore  feasible  blit  unreasonable  Interpretations  of  scenes.  We 
assume  that  a set  of  preferred  Interpretations  are  used  and  that 
they  are  In  some  sense  canonical.  We  assume  that  the 
comparison  does  not  Involve  small  differences  in  a distance 
measure,  but  major  differences  In  structural  descriptions 
(Nevatla  1974,  Nevada  and  Blnford) 

The  interpretation  map  I includes  operators  which  predict 
obscuration,  shadow  and  Illumination,  and  the  effects  of 
perceptual  operators  Including  missing  edges,  limited  resolution, 
and  noise.  These  are  all  quantifiable  and  their  consistency  in 
the  final  Interpretation  can  be  tested.  The  advantage  of  this 
formulation  over  a definition  of  a match  as  a thresholded 
distance  function  Is  that  weights  In  the  evaluation  depend 
greatly  on  object  semantics  for  each  element  of  the  match. 
Surface  markings  do  not  relate  to  characterizing  underlying 
volumes,  hence  they  do  not  enter  Into  the  match  evaluation. 
Inter)  etattons  will  thus  be  of  the  form:  this  is  a 747  with  the 
nose  section  obscured  by  a passenger  ramp.  The  whig  Is 
obscured  partially  by  the  fuselage  and  there  are  markings. 
Interpretation  operators  may  be  top-down  or  bottom-up. 
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Experiment! 

In  this  section  we  briefly  review  tome  tasks  which  hev* 
already  been  carried  out  by  ACRONYM.  The  system  has  not 
yet  found  an  Instance  of  an  object  In  a digitized  Image  given 
only  a high  level  description  of  the  object  class  and  low  level 
descriptors  of  the  Image.  However,  that  achievement  appears 
near.  All  the  necessary  mechanisms  for  such  a test  are 
operational  In  at  least  prototype  form,  and  have  been  tested 
Individually  or  as  subsystems  smaller  than  the  tout 
ACRONYM  system.  At  more  rules  are  written  for  tha  predictor 
and  planner  we  expect  to  toon  be  able  to  run  a complete  test 

The  High-level  Modeler  has  been  used  to  construct  a 
large  number  of  models.  Examples  of  output  from  the  Graphics 
Generator  have  been  presented  In  previous  workshop  papers 
The  Modeler  and  Graphics  controller  are  completely  Interactive, 
enabling  the  user  to  'fly'  around  airports  or  more  generally  to 
orient  a partially  built  model  quickly  and  easily  as  one  would 
orient  a physical  model  by  hand,  to  examine  any  desired  detail. 

The  Predictor  and  Planner  has  been  run  on  a specific 
model  of  an  aircraft  and  a generic  model  of  an  airport.  It 
produced  the  complete  node  structure  of  the  Observability 
Graphs  In  both  cases,  and  Included  quantifiers  In  the  airport 
case,  to  describe  the  numbers  of  allowable  numbers  of  runways 
and  taxlways  In  a valid  airport  Instance.  In  separate  tests  the 
Predictor  and  Planner  has  deduced  the  expected  shapes  of 
simple  generalized  cones. 

The  Matcher  has  been  tested  on  a hand  coded 
Observability  Graph,  matching  against  a hand  coded  Picture 
Graph.  Th/s  test  was  used  during  the  debugging  phase,  and  was 
designed  to  test  all  modes  of  operation  of  the  Matcher.  Thus  the 
particular  Observability  Graph  used  Is  more  complex  than  can 
be  reasonably  be  expected  to  be  generated  by  the  Predictor  and 
Planner,  at  least  until  we  have  had  considerably  more 
experience. 


Pig.  I.  An  LIOII. 

Recent  work  on  the  edge  mapping  module  has  produced 
the  results  reported  In  greater  detail  elsewhere  In  these 
proceedings  (Brooks  (1979)).  Figure  S shows  the  results  of 
simulating  the  matcher  by  hand  to  control  the  edge  mapper. 
The  ribbons  produced  are  a result  of  three  Invocations  of  the 


edge  mapper;  one  invocation  to  find  candidate  ribbons  for 
wings  and  fuselage  (It  found  the  two  wings,  the  fuselage,  a 
passenger  ramp,  and  a large  shed),  one  to  find  the  rear 
stabilizers  and  one  to  find  the  engine  pods.  The  last  two 
Invocations  were  directed  by  the  Inferences  made  as  a result  of 
the  first. 

Work  Is  also  underway  to  transport  the  line  finding 
programs  of  Nevada  and  Babu  (1978)  to  our  laboratory.  Thlt 
added  self  reliance  should  allow  us  much  more  flexibility  with 
our  experimental  procedures,  and  also  give  us  the  option  of 
forcing  some  goa'-dlrectlon  down  to  the  line  finding  level.  Such 
a capability  should  prove  useful  when  the  predictor  and  planner 
has  an  understanding  of  the  process  of  shadow  formation.  The 
line  finder  will  be  able  to  be  directed  to  search  with  more  global 
Information  for  lines  In  low  contrast  areas. 

Other  work  on  stereo,  sponsored  by  ARPA  Is  proceeding 
at  our  lab.  We  intend  to  merge  these  results  Into  ACRONYM, 
In  the  surface  mapping  module.  Depth  Information  will  give  us 
surface  descriptions  to  match  against,  and  enable  us  to  use  much 
more  of  the  three  dimensional  Information  contained  In  our 
models. 
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ABSTRACT 

There  have  been  many  different  approaches 
to  texture  description,  primarily  statistical 
techniques  although  there  has  been  some  work  on 
structural  texture  analysis  all  along.  we 
present  here  a technique  which  can  be  used  to 
easily  derive  parts  of  the  structural 
description  - the  regularity  information  in 
particular.  Sane  limits  on  this  method  arei  its 
use  in  an  overall  texture  description  system  are 
discussed. 


INTRODUCTION 

Many  times,  areas  of  an  image  are  best 
characterized  by  their  texture  rather  than 
purely  intensity  information.  Texture  is  most 
easily  described  as  the  pattern  of  the  spatial 
arrangement  of  different  intensities  (or 
colors) . The  different  textures  in  an  image  are 
usually  very  apparent  to  a hunan  observer , but 
automatic  description  of  these  patterns  has 
proved  to  be  very  complex.  We  are  concerned 
with  a description  of  the  texture  which 
corresponds,  in  some  sense,  to  a description 
produced  fcy  a person  looking  at  the  image. 

Many  statistical  textural  measures  have 
been  proposed  in  the  past  (1-4),  therefore  one 
can  use  some  of  their  results  indicating  what 
measures  may  be  useful.  Among  the  statistical 
measures  vhich  have  been  discussed,  and  used, 
are  analysis  of  the  discrete  Fourier  transform 
to  find  indications  of  the  structure  (4) , 
analysis  of  generalized  gray-level  co-occurence 
matrices  [1),  and  analysis  of  the  edges  (or 
micro-edges)  in  a subwindow  |3).  We  are  not 
interested  in  finding  one  texture  measure  which 
will  distinguish  between  all  regions  (this  is 
the  ultimate,  but  extremely  difficult  problem) 
but  in  finding  a texture  measure  to  use  in 
conjunction  with  many  other  features  of  the 
region  [S»] . 


•Hus  research  was  supported  by  the  Advanced 
Research  Projects  Agency  of  the  Department  of 
Defense  and  was  monitored  by  the  Wright  Patterson 
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The  work  in  what  can  be  called  structural 
texture  description  has  been  more  limited  (5-7J. 
Maleson  [b]  used  simple  regions  as  the  basic 
elements  and  used  relations  between  regions  and 
shape  properties  of  the  region  in  his  analysis. 
Tamura  et  al.  |6)  tried  to  develop  a set  of 
operators  which  would  rate  textures  on  several 
scales,  comparable  to  their  ratings  by  hunan 
subjects.  The  proposals  of  Harr  {7)  for  texture 
analysis  based  on  the  primal  sketch  are  similar 
to  same  of  the  analysis  tftich  we  perform. 

ANALYSIS  OF  TEXTURE 

One  of  the  most  striking  patterns  seen  in 
aerial  images  of  a certain  scale  is  the  regular 
street  or  housing  pattern  of  many  cities  (see 
Fig.  1).  The  appearance  of  this  regularity  is 
its  most  distinguishing  characteristic,  and 
because  the  pattern  is  so  clear  in  the  image  it 
should  be  easy  to  extract.  An  obvious  method  to 
extract  this  regular  pattern  is  the  use  of  a 
2-dimensional  discrete  Fourier  transform.  We 
computed  this  for  various  subwindows  from  the 
image  in  Fig.  1 and  other  images  (subwindows  are 
given  in  Fig.  2).  In  the  Fourier  transform 
results  shown  in  Fig.  3 there  is  some  indication 
of  the  regular  structure  in  the  urban  area 
windows,  but  it  is  not  as  apparent  (visually)  as 
it  is  in  the  image.  Other  attempts  to  derive 
much  of  the  structural  information  from  the 
Fourier  transform  were  only  partially  successful 
[4] , so  we  felt  other  methods  should  be 
attempted. 

The  individual  textural  elements  could  be 
located  and  analyzed  (5),  but  the  simple  regions 
seem  to  be  unreliable  when  the  textural  elements 
are  very  small,  tftich  is  the  case  in  the  urban 
areas.  Another  option  is  to  analyze  an  edge 
image  to  find  the  structure.  The  patterns  in 
the  original  image  will  cause  related  patterns 
to  appear  in  the  edge  image,  and  those  patterns 
should  be  more  consistent  and  easier  to  analyse 
then  the  original  image  data. 

To  study  textures  tftich  are  composed  of 
small  basic  elements,  a small  window  site  edge 
detector  must  be  used.  We  are  interested  in  the 
edges  between  adjacent  textural  elements  and  not 
so  much  in  edges  between  adjacent  textural 
patterns.  The  edge  operator  which  we  use  has 
been  used  successfully  for  other  types  of 


is  indicated  by  10  and  no  edge/edge  by  01. 
Finally  00  means  no  edges  at  either  point.  The 
10  and  01  combinations  mean  the  same  thing  in 
terms  of  the  image  and  thus  are  combined.  The 
most  important  nunbers  are  the  11  totals.  The 
absolute  magnitude  is  not  very  meaningful  since 
this  depends  on  the  total  nunber  of  edges  and  on 
the  spacing  being  used  (within  a given  image 
there  are  more  opportunities  for  a co-occurrence 
edges  with  a anall  spacing  than  a large  spacing) 
in  addition  to  the  actual  frequency  of 
occurrence  of  11 's.  One  good  way  to  normalize 
the  nunbers  seems  to  be  to  use  the  total  of  10, 
01,  and  11.  This  gives  the  proportion  of 
potential  edges  for  co-occurrence  that  actually 
co-occur.  We  computed  these  values  for  4 
directions  and  spacings  from  2 to  32  (at  45°  and 
135°  a spacing  of  2 is  plotted  at  a distance  of 
2/2).  Some  of  these  results  are  given  in 
Fig.  7. 

There  are  several  ways  to  ccnpare  edges  at 
two  points,  with  different  features  indicated  by 
the  different  comparison  methods.  Using  all 
edges  for  every  direction  presents  severe 
problems  in  the  analysis  of  the  output  since 
long  lines  running  in  the  same  direction  as  the 
co-occurrence  computation  will  be  included  along 
with  lines  running  perpendicular  to  the 
direction.  (Tamura  et  al.  (6]  used  this 
feature  to  determine  linear  patterns  in  their 
texture  experiments.)  But,  the  edge  element 
directions  are  available  and  can  be  used  to 
separate  these  two  different  patterns.  The 
first  step  is  to  consider  only  those  edge 
elements  perpendicular  to  the  direction  of 
search,  that  is  in  the  computation  of 
co-occurrences  in  a horizontal  direction  only 
vertical  edges  are  considered.  There  are  an 
almost  unlimited  nunber  of  variations  on  this 
basic  restriction  which  can  either  be  derived 
from  other  variations  or  computed  in  a manner 
similar  to  the  simple  cases.  The  variations 
include:  allow  some  freedom  in  the  edge 
direction  (45°  either  way) , accept  only  perfect 
matches  (ip  and  up,  down  and  down) , accept  only 
opposites  (up  and  down,  not  up  and  ip) , and 
allow  some  freedom  in  the  direction  of  the  last 
two.  The  diferent  combinations  will  all  produce 
results  with  different  information,  so  that 
several  different  ones  can  be  computed. 

DISCUSSION 


analysis  (8).  The  operator  is  applied  over  a 
3x3  window  and  generates  an  edge  magnitude  and 
direction  (1  of  8 directions) . The  direction  is 
defined  so  that  the  brighter  side  is  to  the 
right  when  facing  in  the  direction  of  the  edge. 
Figure  4 shows  the  result  of  applying  this 
operator  to  each  of  the  subwindows  in  Fig.  2. 
The  edge  data  must  be  further  processed  before 
it  is  in  a form  useable  in  texture  analysis. 
Since  an  edge  in  the  image  appears  as  a broad 
peak  in  the  edge  detector  output  (the  width  in 
this  case  is  two  for  a perfect  step  edge) , the 
edges  must  be  thinned.  For  the  experiments  here 
a simple  non-maximal  suppression  was  applied  in 
2 directions  (horizontal  and  vertical) , but  a 
more  sophisticated  suppression  which  considers 
the  directions  of  the  edge  elements  could  also 
be  applied  [8], 


The  suppressed  edge  images  retain  the 
regularity  of  the  initial  image,  but  now  the 
regularity  is  in  the  spacing  of  edge  elements 
not  texture  elements.  A Fourier  transform 
applied  to  this  binary  edge  image  would  indicate 
the  repetitive  nature  of  the  binary  image,  but 
is  obscured  by  the  degeneracies  introduced  by 
the  binary  nature  of  the  input.  Generalized 
gray  level  co-occurrence  computations  (1)  have 
been  studied  for  texture  analysis,  and  were 
intended  to  indicate  sizes  of  textural  elements 
involved  in  the  pattern.  These  can  be  applied 
more  easily  to  a binary  image  than  a general 
intensity  image  to  indicate  the  spacing  of 
edges. 

EDGE  CO-OCCURRBCE  ANALYSIS 


Generalized  gray  level  co-occurrence  matrix 
analysis  is  a basis  for  much  of  the  statistical 
texture  analysis.  Basically,  a set  of  matrices 
are  computed  for  a portion  of  the  image  one  for 
each  selected  spacing  and  angle.  The  entry  in 
the  matrix  at  row  1 and  col  inn  J is  incremented 
each  time  the  first  image  px>int  has  the  value  I 
and  the  point  at  the  given  spacing  and  direction 
has  the  value  J.  Usually  the  image  values  are 
partitioned  into  a small  set  of  values  (8  rather 
than  256) , so  that  it  is  even  possible  to 
compute  the  initial  matrix.  Also  the 
computation  is  applied  for  many  spacings 
(1,2, 3, 8,  etc.)  and  several  directions 
(0°, 45°, 90°, etc.)  as  shown  in  Fig.  6.  Because 
of  the  large  mmber  of  large  matrices  that  are 
generated  by  this  method  various  measures  are 
computed  on  the  matrix  values,  and  the 
classification  is  performed  using  these  measures 
(1].  The  in— on.  and  useful  measures  do  not  seem 
to  capture  the  important  feature  in  the  edge 
images:  the  regular  pacing  of  edge  elements, 
but  this  is  available  in  the  co-occurrence 
matrix  itself. 


None  of  this  analysis  would  be  worthwhile 
if  it  did  not  make  the  job  of  describing  regular 
textures  any  easier.  The  highly  regular 
patterns  of  the  San  Francisco  urban  area  (the 
top  row  of  Figs.  2-5  and  Fig.  7a,  7b)  and  raffia 
(the  bottom  row  and  Fig.  7c,  7d)  produce  strong 
periodic  patterns  in  the  plot  of  the 
co-occurrence  measure.  A high  value  in  the 
graphed  measure  indicates  that  edges  frequently 
occur  at  that  particular  spacing.  This  spacing 
information  can  be  used  to  determine  the  site 
and  spacing  of  the  textural  elements,  and  the 
overall  strength  of  the  peak  can  be  used  to 
determine  how  regular  the  pattern  is. 


Itien  binary  edge  images  are  used  for 
co-occurrence  analysis,  many  simplications  in 
the  computation  can  be  made.  Me  will  use  a 1 to 
indicate  an  edge  at  a given  point,  and  11  to 
indicate  edges  occurring  at  both  the  first  point 
end  the  second  point  which  is  at  some  distance 
and  angle  from  the  first.  The  edge/no  edge  pair 
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The  spacing  of  pairs  of  textural  elements 
is  given  by  the  peak  to  peak  spacing  using  the 
measure  which  matches  edges  only  in  the  exact 
sane  direction  (as  in  Fig.  7a,c).  The  size  of 
individual  elements  is  best  given  by  the  measure 
which  allows  only  edges  in  the  opposite 
direction  (as  in  Fig.  7b,d) . The  solid  line  in 
the  graph  indicates  the  size  of  dark  objects  and 
the  dotted  line  the  size  bright  objects.  The 
size  is  from  the  first  major  peak,  the 
succeeding  peaks  are  caused  by  the  repeated 
pattern.  By  comparing  the  results  from  the  4 
directions,  the  orientation  of  the  texture  can 
be  predicted.  Since  patterns  usually  do  not 
line  up  with  one  of  the  4 directions  there  will 
be  some  contribution  to  2 of  the  directions. 
Mien  these  directions  are  45°  apart  the  dominant 
direction  is  probably  between  them  (as  in  San 
Francisco,  Fig.  7a,b).  But  when  they  are  90° 
apart  there  should  be  a regular  pattern  in  two 
directions  (as  in  Raffia,  Fig.  7c,d).  Thus, 
from  the  data  we  can  say  that  the  San  Francisco 
subwindow  has  a regular  pattern  of  bright  and 
dark  regions  oriented  in  one  direction,  near 
45°,  with  the  bright  regions  being  larger  (width 
about  10  pixels)  than  the  dark  ones  (width  about 
4) . Note  that  the  size  of  the  blocks  in  the 
other  direction  is  near  the  size  limit  of  the 
co-occurrence  computation  and  also  that  very  few 
of  the  edges  at  the  ends  of  the  blocks  are 
detected. 

The  irregular  textural  patterns  (e.g.the 
suburban  areas  of  the  second  row  of  Fig.  4,  and 
the  grass  and  sand  of  the  third  row,  first  and 
second  windows)  do  not  produce  the  same  clearly 
periodic  patterns  of  raffia  as  shown  by 
Fig.  8a,b  (for  grass  and  suburban, 
re^ectively) . But  it  is  possible  to  derive 
certain  useful  features  from  these  results, 
primarily  that  of  the  size  of  the  textural 
elements.  The  strong  peak  near  3 for  grass  and 
4 or  6 for  suburban  indicates  a dominant  size 
for  textural  elements  (in  the  case  of  suburban 
probably  2 different  sizes) . The  graphs 
indicate  that  the  grass  has  thin  dark  and  bright 
textural  elements,  predominately  vertical  and  to 
a lesser  extent,  horizontal.  The  suburban  area 
has  only  bright  regions  somewhat  larger.  These 
descriptions  still  leave  open  the  question  of 
whether  the  texturall  elements  are  long  and  thin 
or  stall  and  round.  The  lack  of  a substantial 
peak  in  the  45°  or  135°  direction  for  grass 
indicates  that  it  is  probably  long  and  thin  and 
the  small,  though  readily  apparent  peak  in  the 
graphs  for  the  suburban  windows  indicates  that 
the  regions  are  probably  small  and  round  or  more 
likely, rectangular) . 

This  is  not  a complete  description  of  the 
textures,  but  serves  as  a good  initial 
description  of  the  patterns.  There  are  still 
other  important  features  of  the  textures  which 
are  not  derived  by  this  method,  but  could  be 
computed  by  other  techniques.  This  procedure 
has  bean  applied  on  many  other  irregular  and 
nonregular , patterns  with  results  similar  to 
those  for  the  windows  in  Fig.  2 (11). 


GOCLUSICNS 

General  texture  analysis  is  a very 
difficult  problem,  but  this  analysis  of  edge 
images  appears  to  be  an  effective  method  to 
extract  many  important  structural  features  from 
the  textural  patterns.  One  major  unanswered 
question  is  whether  or  not  all  of  the 
information  derived  by  the  human  user  can  be 
reliably  derived  by  a program.  He  are  still 
working  on  the  automatic  extraction  of  this 
information  from  the  data  which  is  produced  by 
this  textural  analysis  method. 
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Fig.  2.  16  Suhwindows  for  Texture  Analysis 


Fig.  5.  Nnn-nviximal  Suppressed  Edges  fran  Fig.  4 
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ABSTRACT 

A blob  is  a compact  region  lighter  (or  darker) 
than  its  background,  surrounded  by  a smoothly 
curved  edge.  Relaxation  can  be  used  to  enhance 
the  blob’s  interior,  l.e.,  initial  "light"  and 
"dark”  "probabilities"  can  be  made  less  ambi- 
guous. Similarly,  It  can  be  used  to  enhance  the 
blob's  edge,  l.e.,  initial  (oriented)  "edge"  and 
"no  edge”  probabilities  can  be  disambiguated. 
This  paper  uses  two  cooperating  relaxation  pro- 
cesses to  Improve  both  sets  of  probabilities  by 
allowing  them  to  Interact  with  each  other  as 
well  as  with  themselves;  for  example,  "light" 
and  "edge"  mutually  reinforce  if  the  light  point 
is  on  the  light  side  of  the  edge.  Results  ob- 
tained in  this  way  are  better  than  the  results 
obtained  using  either  relaxation  process  alone. 


INTRODUCTION 

Segmentation  of  objects  from  their  back- 
ground is  basic  to  many  image  understanding  tasks. 
This  paper  deals  with  the  relatively  simple  case 
In  which  the  object  is  a "blob",  l.e.,  a compact 
region,  generally  lighter  (or  darker)  than  its 
background,  surrounded  by  a smooch  edge.  It 
points  out  how  even  this  case  cannot  always  be 
handled  by  elementary  segmentation  methods,  and 
describes  a compound  relaxation  process  that 
makes  use  of  both  light/dark  probabilities  and 
edge/no  edge  probabilities  to  discriminate  blobs 
from  their  background. 

In  principle,  blobs  can  be  extracted  by  thres- 
holding the  image  ac  an  appropriate  level.  How- 
ever, if  the  image  is  noisy,  thresholding  will 
produce  noisy  results  which  may  or  may  not  be  re- 
pairable by  postprocessing.  Moreover,  thres- 
holding may  extract  regions  that  are  not  bounded 
by  edges,  but  are  amooth  continuations  of  the 
background,  if  the  gray  level  fluctuations  in 
the  background  happen  to  cross  the  threshold 
level. 

Edge  detection  is  sometimes  useful  in  object 
extraction.  Here  again,  however,  there  may  be 
many  edge  detector  responses  in  the  interior  of 
the  object  or  background  due  to  noise,  and  there 
may  fall  to  be  sufficiently  strong  responses  on 
the  object/background  border  due  to  blur. 


Relaxation  [1,2]  has  been  used  to  Improve 
the  results  of  both  thresholding  [3]  and  edge 
detection  (4,5).  To  apply  relaxation  to  thres- 
holding, we  Initially  assign  "light”  and  "dark” 
probabilities  to  the  image  points  based  on  their 
gray  levels.  We  then  iteratively  adjust  these 
probabilities  at  each  point  based  on  the  proba- 
bilities at  the  neighboring  points,  l.e.,  light 
reinforces  light  and  dark  dark.  This  has  the 
effect  of  shifting  the  probabilities  initially 
assigned  to  noise  points  so  as  to  make  them 
more  consistent  with  their  surroundings.  Even- 
tually, the  light  probabilities  at  all  points 
of  a light  region  should  become  uniformly  high, 
and  vice  versa,  so  that  thresholding  becomes 
easy,  and  should  produce  non-nolsy  results. 

Note,  however,  that  the  process  may  still  extract 
regions  that  are  not  bounded  by  edges. 

To  apply  relaxation  to  edge  detection,  we 
initially  assign  "edge"  and  "no  edge"  probabili- 
ties to  each  image  point  (or,  alternatively,  to 
each  adjacent  pair  of  points)  based  on  the  relative 
values  of  the  gray  level  differences  in  various 
directions  around  the  point.  We  then  iteratively 
adjust  these  probabilities  based  on  the  probabil- 
ities at  neighboring  points:  no  edge  reinforces 
no  edge;  edge  reinforces  edge  if  they  ssnothly 
continue  one  another,  and  reinforces  no  edge  (and 
vice  versa)  if  they  are  alongside  one  another. 

This  has  the  effect  of  strengthening  the  appro- 
priate edge  probabilities  at  points  that  lie 
along  smooth  edges,  and  strengthening  the  no 
edge  probability  elsewhere,  so  that  edge  detec- 
tion should  yield  less  noisy  results. 

As  we  shall  see  in  this  paper,  further  im- 
provement in  the  quality  of  the  results  is  ob- 
tained if  we  use  both  the  light/dark  and  edge/no 
edge  relaxation  processes,  and  also  allow  them  to 
Interact  with  each  other.  For  example,  "light" 
and  "edge”  at  a pair  of  neighboring  points  rein- 
force one  another  if  the  light  point  la  on  the 
light  side  of  the  edge  point,  but  they  weaken 
one  another  if  it  is  on  the  dark  side.  The 
details  of  this  joint  probability  adjustment  pro- 
cess will  be  given  in  the  next  section. 

The  joint  (llght/darkV(edge/no  edge)  relaxa- 
tion process  can  be  regarded  as  making  use  of 
convergent  evidence  in  (probabilistic)  segmenta- 
tion. Gray  level  and  edge  value  are  used  Jointly 
for  segmentation  in  the  (nonprobablllatlc)  "Super- 
slice”  scheme  [6-8],  among  others.  In  this  scheme 
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a aet  of  thresholds  la  applied  to  the  inage;  for 
each  of  these,  the  connected  components  of  above- 
threshold points  are  extracted;  and  we  call  a 
component  an  "object"  If  there  are  many  local 
maxima  of  edge  strength  around  Its  border.  Thus 
Superslice  first  makes  two  independent  decisions 
based  on  gray  level  (thresholding)  and  edge 
strength  (selection  of  maxima),  and  then  checks 
them  against  each  other;  when  they  agree,  we  say 
that  an  object  has  been  detected.  The  joint  re- 
laxation process  described  in  this  paper,  on  the 
other  hand,  never  makes  decisions;  it  estimates 
probabilities  based  on  the  gray  levels  and  edge 
strengths,  and  then  iteratively  adjusts  these 
probabilities  so  that  both  types  of  information 
are  able  to  interact.  This  process  thus  combines 
the  convergent  evidence  principle  with  the  prin- 
ciple of  deferred  commitment.  As  Harr  has  pointed 
out,  both  of  these  principles  are  very  desirable 
characteristics  of  an  image  understanding  system. 


in  which  these  classes  lie,  are  described  in  [3]. 

Figure  1 shows  another  FLIR  image  of  a blob 
and  the  results  of  applying  eight  iterations  of 
the  light/dark  relaxation  process  to  it.  We  see 
that  the  contrast  between  the  blob  and  its  back- 
ground has  been  greatly  enhanced,  but  that  some 
of  the  light  patches  and  apots  in  the  background, 
which  do  not  seem  to  be  bloblike  (l.e.,  they  lack 
good  edges)  have  also  been  somewhat  enhanced.  We 
shall  next  see  what  happens  when  the  edge  relaxa- 
tion process,  and  then  the  Joint  process,  are 
applied  to  Figure  1. 

EDGE  RELAXATION 

Let  e^d-l, . . . ,8)  be  a measure  of  the  gray 

level  difference  at  point  F in  direction  45i*. 
(In  the  experiments  described  below,  we  used  the 
set  of  masks 


LIGHT/DARK  RELAXATION 

Let  g be  the  gray  level  of  point  P,  and 
let  b,w  be  the  lowest  and  highest  gray  levels 
in  the  shape  so  that  b 5 g 5 w for  all  P.  We 

take  p^  s as  the  estimate  of  the  probability 
that  P is  white,  and  p^  * — £ as  the  probability 
that  P is  black. 

The  compatibilities  between  the  black  and 
white  probabilities  at  adjacent  points  can  be 
estimated  by  PjPj/P^Pj*  where  i, J - b or  w.  Here 

Pj  and  Pj  are  the  average  probabilities  of  colors 
i and  J,  while  PjPj  i®  the  average  of  the 

product  of  these  probabilities  taken  over  all 
pairs  of  adjacent  points.  These  compatibilities 
can  then  be  used  as  coefficients  in  the  relaxation 
process  described  in  [9 J . (Alternatively,  their 
logs,  suitably  rescaled,  can  be  used  as  coeffi- 
cients in  the  process  described  in  [1];  the  re- 
sults obtained  are  essentially  the  same  (3].) 

Figure  4 of  (10]  showed  the  results  of 
eight  iterations  of  the  relaxation  process  just 
described,  applied  to  a FLIR  image  of  a tank.  In 
these  figures,  the  probabilities  have  been  redis- 
played as  gray  levels,  i.e.,  g • b + pw(w-b)  » 

w - pb(w-b).  The  histogram  gradually  turns  into 

a pair  of  spikes  at  opposite  ends  of  the  gray- 
scale, and  the  resulting  discrimination  between 
tank  and  background  (it  is  not  a segmentation, 
since  we  have  not  actually  thresholded)  is  quite 
good. 


It  should  be  pointed  out  that  in  this  simple 
scheme,  if  most  of  the  points  in  the  tank  and  the 
background  had  gray  levels  in  the  same  half  of 
the  grayscale,  the  relaxation  process  would  have 
driven  them  both  to  the  same  end  of  the  gray- 
scale. More  general  gray  level  relaxation  pro- 
cesses which  do  not  require  that  there  be  only 
two  classes  ("light"  and  "dark"),  and  do  not 
depend  for  their  success  on  restricting  the  ranges 
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to  compute  the  e's.)  Let  E the  largest  e 
value  at  any  point  of  the  image,  and  let  e^  be 

the  largest  value  at  the  point  P.  We  take 
Pe  > eip/E  as  an  estimate  of  the  edge  probability 

at  P,  and  p s 1-p  as  an  estimate  of  the  no  edge 
n e 

probability j Moreover,  we  take  pt  * (e^/sjp^ 

where  s « Z p , as  an  estimate  of  the  probability 
i-1  1 8 

of  an  edge  in  direction  451*  at  P;  thus  £ p -p  • 

i-1  1 * 

The  pairwise  compatibilities  among  p.,..., 

p ,p  are  estimated  using  ratios  of  average  pro- 
S n 

bablllties,  as  in  the  preceding  section,  and  these 
are  used  as  coefficients  in  the  relaxation  process 
of  [9], 


Figure  2 shows  the  Initial  p(  values  for 

the  same  image  as  in  Figure  1,  rescaled  as  gray 
levels  (i.e.,  g - b+pe(w-b)),  and  eight  iterations 

of  the  edge  relaxation  process  applied  to  these 
initial  values.  We  see  that  the  edges  of  the  blob 
are  significantly  enhanced,  but  some  edges  also 
begin  to  come  out  of  the  background. 


JOINT  (LIGHT/DARK)/ (EDGE/NO  EDGE)  RELAXATION 


Suppose  now  that  we  define  compatibilities 

between  p.  ,p  and  p, ,...,pa,P  in  the  same  manner 
d w 1 Bn 

as  in  the  previous  two  sections.  Given  the  two 
initial  sets  of  probabilities  for  each  point,  we 
can  then  compute  adjusted  probablltles;  but  we 
normalise  each  of  the  two  seta  separately,  i.e., 
at  each  iteration  we  divide  the  new  estimates 
p£  and  p^  by  p^+p^,  and  the  new  estimates 

pi*---,p8,pn  by  pi+"-+Pi+pi' 

The  results  of  doing  this  for  the  image  used 
in  Figures  1-2  are  shown  in  Figure  3.  This  con- 


slats  of  pairs  of  iaages,  in  one  of  which  Is 

displayed  as  a gray  level,  and  In  the  other  pe< 

We  see  that  the  use  of  the  joint  process  has  In- 
hibited the  growth  of  edges  in  the  background 
(compare  Figures  2 and  3).  However,  it  has  nor. 
entirely  inhibited  the  emergence  of  white  patches, 
not  bounded  by  edges,  from  the  background.  This 
Is  because  (understandably!)  the  no-edge  probabil- 
ity has  no  Inhibitory  effect  on  either  the  light 
or  dark  probability. 

Improved  results  are  obtained  if  we  Ini- 
tialize pfe  and  p^  differently,  so  that  p^ 

is  initially  high  only  adjacent  to  edges  on.,  their 
light  sides.  (The  idea  that  the  human  visual 
system  "colors  in"  regions  based  on  the  gray\ 
levels  adjacent  to  their  edges  is  well  known; to 
perception  psychologists. ) Specifically,  for 
each  of  the  masks  shown  in  the  preceding  section, 
we  add  the  difference  value  e^  to  the  pointfl 

marked  by  l's.  This  is  done  for  every  mask  in 
every  position.  The  result  is  an  array  of  "bor- 
derness"  values  which  are  high  on  the  light  sides 
of  edges  and  low  elsewhere. 

We  can  now  initialize  p^  and  p based 
b w 

on  a combination  of  the  gray  level  and  the  border- 
ness  value  at  each  point.  Let  B be  the  maximum 
borderness  value  in  the  image,  let  0 be  the 
value  at  point  P,  and  let  Pp  ■ 0/B.  Let  pj  s 

apu  +(l-a)pg,  where  0 2 a £ 1,  and  let  p*  s 1-p*. 

The  results  obtained  when  we  use  the  Initial 

values  p*  and  p*,  rather  than  p and  p.  , 
w b w b 

for  a ■ .25  and  .5,  are  shown  In  Figures  4-5, 
respectively.  (Results  for  the  light/dark  pro- 
cess alone  using  these  initial  values  are  shown 
in  Figures  4 '-5'.)  The  coefficients  are  the  same 
as  those  used  earlier;  only  the  initial  values 
are  different.  We  see  that  there  is  good  im- 
provement over  the  results  in  Figure  3,  as 
regards  inhibiting  the  emergence  of  light  patches 
from  Che  background. 


POSSIBLE  EXTENSIONS 

This  paper  has  illustrated  the  advantages  of 
allowing  interactions  between  relaxation  pro- 
cesses. This  approach  can  be  regarded  as  using 
convergent  evidence  at  the  probabilistic  or  fuzzy 
level,  prior  to  making  any  commitment  to  a firm 
decision. 

An  alternative  idea,  currently  under  inves- 
tigation, is  to  use  "inside"  and  "outside"  labels 
in  addition  to  the  "light"  and  "dark"  labels. 
Initially,  most  points  would  have  equal  probabil- 
ity of  being  inside  or  outside  (assuming  we  do 
not  know  whether  the  blobs  are  light  and  the  back- 
ground dark  or  vice  versa),  except  that  points 
adjacent  to  an  edge  on  the  side  away  from  the 
center  of  curvature  have  higher  probability  of 
being  outside,  while  those  on  the  side  toward  the 


center  of  curvature  have  higher  probability  of 
being  inside.  These  probabilities  can  then  re- 
inforce one  another,  l.e.,  inside  reinforces 
inside  and  outside  reinforces  outside  for  a 
neighboring  pair  of  points,  to  the  extent  that 
they  are  not  separated  by  an  edge.  Initially, 
the  probabilities  at  a given  point  of  the  blob 
border  will  depend  on  whether  the  border  is  con- 
vex or  concave  at  that  point;  but  eventually  the 
interior  of  the  blob  should  be  uniformly  labeled 
"inside",  and  the  exterior  "outside",  with  high 
probability.  Note  that  this  method  can  also  be 
used  to  label  the  interior  of  a closed  curve, 
even  if  it  does  not  differ  in  gray  level  from 
the  exterior.  Processes  of  this  type  might  be 
used  to  model  figure-ground  ambiguity  (the  Rubin 
vase).  Similarly,  processes  involving  light/ 
dark  labels  might  be  used  to  model  pseudoedges 
(the  Craik-O'Brien-Cornsweet  phenomenon)  and 
virtual  edges. 

An  interesting  possible  extension  would  be  to 
use  a (llght/dark)/(edge/no  edge)  relaxation 
process  at  several  different  resolutions  (e.g., 
at  each  level  of  a "pyramid"),  and  allow  the 
levels  to  interact.  Note  that  blobs  look  like 
(local)  spots  at  the  appropriate  level,  so  that 
immediate  interaction  between  the  blob  interior 
and  the  edges  on  all  sides  of  it  becomes  possible; 
in  the  process  described  in  this  paper,  on  the 
other  hand,  a considerable  number  of  iterations 
is  required  in  order  for  the  parts  of  a large 
blob  to  reinforce  one  another.  Thus  extending 
blob-detection  relaxation  to  a pyramid  represen- 
tation would  permit  fast  interactions  among  all 
parts  of  the  blob;  this  is  an  example  of  the 
speed  advantage  of  a pyramid  cellular  processor 
over  an  ordinary  cellular  array.  Similar  remarks 
apply  to  the  lnside/outslde  labelling  process. 

The  pyramid  extension,  which  is  currently 
being  explored,  also  provides  some  other  possible 
benefits.  A compact  blob  maps  into  a spot  at 
some  level  of  the  pyramid,  while  an  elongated 
(but  relatively  straight)  blob  maps  into  a line 
segment.  By  suitably  designing  the  interlevel 
interactions,  one  could  bias  the  process  to  pre- 
fer elongated  blobs  over  compact  ones,  or  vice 
versa.  This  would  provide  an  interesting  basis 
for  using  primitive  types  of  shape  information 
from  the  very  beginning  of  the  segmentation  pro- 
cess, rather  than  performing  general-purpose 
segmentation  and  then  rejecting  regions  that  have 
the  wrong  type  of  shape.  Conceivably,  multi- 
level interactions  in  a pyramid  may  also  provide 
a basis  for  (fuzzy)  hierarchical  representation 
of  complex  shapes,  but  it  is  not  yet  clear  how  to 
do  this.  Some  other  mechanism  than  a pyramid 
would  certainly  be  necessary  for  representing 
the  shapes  of  curves. 

Interactions  among  relaxation  processes  may 
also  be  useful  in  processing  multiple  images, 
e.g.,  in  detecting  objects  in  steopalrs  or  in 
time  sequences  of  images.  This  would  involve 
iterative  estimation  of  disparity  or  velocity 
at  each  point  concurrently  with  the  estlautlon 
of  llghtness/darkness  and  edgeness.  Some  work 


on  the  time  sequence  applications  is  currently  In 
progress,  and  will  be  reported  elsewhere. 
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Figure  3.  Eight  iterations  of  the  combined 
process. 


Figure  4 


Same  as  Figure  3,  but  using  Initial 
values  based  .25  on  gray  level  and 
.75  on  "borderness"  (see  text). 


Figure  4'.  Same  as  Figure  4,  for  the  light/dark 
process  only. 


Figure  5.  Same,  but  using  (.5,. 5)  Initial 
values. 


Figure  5'.  Same  as  Figure  5,  for  the  light/dark 
process. 
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ABSTRACT 


We  describe  a set  of  techniques  for 
Identifying  and  classifying  vehicles  In  aerial 
Images  or  road  scenes.  The  approach  is  knowledge- 
based  and  draws  on  three  sources  of  knowledge: 
generic  knowledge  about  the  domain,  a data  base 
containing  Information  specific  to  the  site,  and 
data  associated  with  the  image  Itself. 
Understanding  shadows  is  crucial  to  successful 
scene  understanding  in  this  domain,  and  we  present 
three  techniques  for  dealing  with  them.  After  a 
correlation  road  tracker  locates  a road  and 
Identifies  visual  anomalies,  the  anomalies  are 
examined  by  a set  of  expert  subroutines  that  seek 
to  establish  If  the  anomaly  is  the  Image  of  a 
vehicle  (plus  Its  shadow)  or  of  something  else. 
Vehicles  may  be  separated  Into  types  according  to 
their  dimensions. 


I INTRODUCTION 


One  of  the  overall  goals  of  the  SRI  Image 
Understanding  project  is  to  explore  the  ways 
knowledge  can  be  used  on  problems  of  Interpreting 
aerial  Imagery  [1].  It  has  been  demonstrated 
repeatedly  that  the  more  knowledge  Is  available  and 
used  In  artificial  Intelligence  programs,  the 
better  the  results  tend  to  be.  The  ideal  system 
should  be  flexible  enough  to  use  the  information 
that  It  has,  but  also  to  be  able  to  function, 
albeit  with  somewhat  reduced  capabilities,  if  some 
of  the  information  is  not  available. 

This  paper  describes  an  approach  to  finding 
and  identifying  vehicles  in  aerial  Images,  using 
diverse  sources  of  knowledge.  The  following 
traffic-monitoring  scenario  provides  a domain  for 
this  work:  Given  a digital  aerial  Image  and  a data 
base,  the  problem  Is  to  detect  vehicles  on  the  road 
and  to  classify  them  as  to  vehicle  type.  The  Image 
should  have  sufficient  spatial  resolution  to  allow 
recognition  (about  one  foot  per  pixel,  minimum). 
Figure  1 shows  a typical  Image  of  an  area 
containing  a freeway  Interchange. 


The  data  base  contains  Information  about  some 
limited  geographical  area  of  Interest.  As  a 
minimum,  it  should  have  the  locations  of  known 
roads  in  the  area.  Other  relevant  Information 
could  include  (but  not  be  limited  to): 

• Road  width 

• Brightness  profiles  across  the  road 

• Terrain  Information 

• Buildings,  railroads,  and  other  cultural 
features 

• Intersections,  overpasses,  and  access  roads 

• Signs  and  permanent  road  markings 

• Previous  photo  coverage  of  the  area,  in 
digital  form. 


Figure  1 An  Aerial  Road  Image 
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A calibration  procedure  [2]  establishes 
correspondence  between  Image  coordinates  and 
geographic  coordinates,  allowing  us  to  convert 
quickly  back  and  forth  between  coordinates  in  the 
data  base  and  pixel  locations  in  the  image.  A road 
tracker  [3]  uses  the  road  location  predicted  by 
the  data  base  to  trace  the  road  centerline  and 
boundaries  by  correlating  successive  profiles 
perpendicular  to  the  road  direction.  Areas  where 
the  image  disagrees  with  the  expected  road  profile 
are  identified  as  "anomalies.”  These  areas  are 
passed  to  the  classification  routines  for  further 
scrutiny. 

Many  different  conditions  could  give  rise  to 
an  anomaly.  Vehicles  usually  show  up  this  way,  but 
so  do  the  shadows  of  objects  off  the  road  (trees, 
buildings,  signs,  utility  poles),  overhanging 
trees,  painted  markings  on  the  road,  and  changes  or 
irregularities  in  the  road  surface  (such  as  tar 
patches).  There  are  also  some  less  usual 
situations  with  which  a practical  system  ought  to 
deal,  such  as  road  construction,  floods,  bomb 
craters,  smoke,  and  dust  clouds.  The  classifier 
must  first  decide  if  the  anomaly  arises  from  a 
vehicle  or  from  some  other  cause.  Only  then  can  it 
proceed  to  classify  the  vehicle  type. 

Although  the  scenario  assumes  some  rather 
specific  resources  and  goals,  the  knowledge-based 
approach  we  have  developed  for  vehicle 
classification  is  generally  applicable  to  a wide 
range  of  object-recognition  tasks  in  cartography 
and  photo-interpretation. 


II  SOURCES  OF  INFORMATION 


A wide  variety  of  information  can  be  helpful 
for  detecting  and  classifying  vehicles.  We  can 
identify  three  kinds  of  knowledge  relevant  to  this 
problem:  knowledge  about  the  problem  domain 
(generic  knowledge),  knowledge  about  the  site  (the 
data  base),  and  knowledge  about  a particular  place 
and  time  (information  associated  with  the  image). 

Generic  knowledge  includes  information  that 
can  be  deduced  from  functional  descriptions.  A 
road  la  a narrow,  linear  region  upon  which  vehicles 
may  travel.  The  road  Is  usually  continuous  in  the 
image— If  It  appears  discontinuous  it  may  be  that 
there  are  obstructions,  or  there  nay  be  shadows  or 
discolorations  on  the  road  surface.  Roads  have 
minimal  variation  in  the  direction  of  travel  but 
may  have  considerable  variation  in  the 
perpendicular  direction,  because  of  the  different 
compositions  of  road  bed,  shoulders,  and  an 
expected  pattern  of  oil  stains  in  the  center  of 
each  lane.  We  have  some  Idea  of  the  expected 
shapes  of  vehicles  viewed  from  different  angles, 
and  an  expectation  that  they  probably  will  be 
aligned  parallel  to  the  road  direction.  Our 
illumination  models  take  Into  account  the  physics 
and  geometry  of  shadows,  and  we  can  sometimes  use 
shsdows  to  make  Inferences  about  objects.  We  know 
the  usual  places  where  road  signs,  utility  poles, 


and  painted  road  markings  are  located.  All  the 
foregoing  can  be  used  to  make  sense  out  of  a road 
scene. 

The  data  base  is  a useful  source  of 
information.  Its  principal  use  is  to  predict  the 
approximate  road  centerline  so  that  the  road 
tracking  subroutines  can  operate.  But  other  kinds 
of  Information  can  be  brought  Into  play.  Terrain 
information  can  be  used  to  refine  position 
estimates  when  the  viewing  angle  is  not  vertical, 
and  to  predict  shadows  better  if  the  ground  slopes. 
Classifying  shadows  of  objects  off  the  road  is  very 
much  simplified  when  it  is  known  what  objects  are 
likely  to  cast  shadows.  Ambiguous  anomalies  In  the 
image  can  sometimes  be  distinguished  if  a picture 
can  be  compared  with  a previous  one  or,  better  yet, 
if  the  data  base  states  what  anomalies  were  found 
in  previous  Images  and  how  they  were  classified. 
Intelligence  reports  and  expected  traffic 
conditions  can  help  the  program  decide  what  to  look 
for  or  what  strategies  to  use. 

The  greatest  single  source  of  data  is  the 
image  Itself.  It  is  easy  to  overlook  some 
information  that  is  associated  with  the  image  but 
may  not  be  in  the  actual  raster.  For  example,  it 
is  usually  possible  to  ascertain  (at  least 
approximately)  the  altitude,  position,  and  heading 
of  the  aircraft  from  which  the  image  was  taken. 
Scaling  parameters,  view  angles,  and  compass 
headings  can  be  derived  by  calibration.  If  the 
time  and  date  the  picture  was  taken  are  known,  the 
sun  position  can  be  calculated,  but  even  without 
these  data  the  sun  position  usually  can  be 
estimated  from  shadows. 

In  short,  detection  and  classification  of 
vehicles  is  not  based  solely  on  what  is  in  the 
image.  In  the  following  sections,  we  detail  some 
of  the  ways  we  use  the  available  information. 


Ill  USE  OF  THE  CORRELATION  ROAD  TRACKER 


We  depend  on  the  correlation  road  tracker 
designed  by  Quam  [3]  to  Isolate  anomalies  in 
Images  or  roads.  These  are  regions  where  attention 
should  be  focused. 

The  road  tracker  is  based  on  the  assumption 
that  variations  in  road  surface  materials, 
centerlines,  and  Intralane  wear  patterns  correspond 
linearly  to  the  road  Itself.  Vehicles  and  other 
anomalies,  however,  stand  out  as  being  quite 
different  from  the  pattern  of  the  road.  Detecting 
these  anomalies  is  important  to  the  operation  of 
the  road  tracker.  Where  substantial  disagreement 
occurs  between  successive  profiles,  the 
corresponding  pixels  are  marked  as  anomalies,  so 
that  these  points  can  be  eliminated  from  the 
correlation  calculations.  If  the  anomalies  were 
not  so  masked,  they  would  perturb  the  location  of 
the  correlation  peak  and  Introduce  errors. 
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Figure  2a  shows  a representative  excerpt  from 
the  area  covered  by  the  image  of  Figure  1.  The 
road  tracker  Is  Initiated  by  specifying  a single 
profile  approximately  perpendicular  to  the  road 
direction  and  centered  on  It.  This  Initial 
baseline  is  selected  manually  now,  but  facilities 
exist  for  using  the  data  base  to  draw  the  baseline 
automatically. 


The  road  tracker  produces  several  forms  of 
output.  As  Indicated  by  Quam  [3].  the  program  can 
produce  a point  list  describing  the  track  of  the 
road  center,  and  a binary  image  of  all  points  In 
the  road  that  are  anomalous.  But  for  vehicle 
Identification,  another  form  of  output  has  been 
added.  The  road  reflectance  model  may  be 
subtracted  from  each  pixel  considered,  resulting  in 
a difference  image  that  has  the  road  profile 
subtracted  out.  Figure  2b  shows  the  baseline,  the 
road  center,  and  anomalies  detected.  Figure  2c 
shows  the  difference  Image. 


Figure  2a  Road  Scene 


The  grey-scale  difference  image  may  be 
converted  to  a binary  anomaly  Image  by 
thresholding.  Although  we  now  use  a threshold 
Identical  to  the  one  used  by  the  road  tracker  to 
produce  the  anomaly  Image,  the  threshold  value 
could  be  adjusted  as  a function  of  various 
considerations,  such  as  success  or  failure  of  a 
previous  analysis. 


In  the  difference  Image,  shadows  tend  to  have 
a relatively  uniform  Intensity,  even  though  the 
road  reflectance  profile  varies  considerably.  If 
we  adopt  the  simplifying  assumptions  that  any 
object  casting  a shadow  may  be  approximated  by  a 
half  plane  of  Infinite  extent  that  hides  all  but  a 
fixed  proportion  of  the  sky,  and  neglect  reflected 
illumination  from  nearby  objects,  then  the  ratio  or 
intensities  across  the  shadow  edge  should  not 
depend  on  the  reflectivity  of  the  underlying 
surface,  then  the  original  Image  is  digitized  on  a 
logarithmic  brightness  scale,  this  constant  ratio 
becomes  a constant  Intensity  In  the  difference 
Image.  Because  the  assumptions  are  approximate  at 
best,  the  constant-difference  test  Is  almost  never 
exact.  Nonetheless,  by  subtracting  the  road 
profile  from  the  image,  we  can  expect  the  Intensity 
of  shadows  to  be  more  uniform  in  the  difference 
image  than  in  the  original  one. 


Figure  2b  Baseline,  Centerline,  and  Anomalies 


On  the  other  hand,  when  anomalies  are  caused 
by  vehicles,  subtracting  the  road  profile  will 
cause  Its  Inverse  to  be  superimposed  on  the 
anomaly.  Figures  3a  and  b show  an  original  Image 
and  a difference  Image  (from  another  road  site) 
that  demonstrate  these  peculiarities.  Both  kinds 
of  Image  are  useful  in  classifying  anoawlies. 


As  the  road  tracker  proceeds,  It  constantly 
keeps  track  of  the  average  correlation  between 
successive  road  profiles  at  their  optimum 
locations.  This  correlation  value,  a useful 
estimate  of  noise  in  the  picture.  Is  made  available 
to  succeeding  classification  stages. 


Figure  2c  Difference  Image 


It  should  be  possible  in  principle  to  automate  this 
procedure,  for  example  by  using  the  using  the  data 
base  to  predict  or  find  known  shadows. 
Alternatively,  It  seems  likely  that  a formula  can 
be  derived  that  will  give  the  expected  distribution 
based  on  calibration  of  photometry. 


In  situations  In  which  the  correlation  road 
tracker  Is  not  applicable,  shadows  located  by  the 
brightness  model  might  Indicate  areas  of  the 
picture  that  deserve  scrutiny. 


Figure  3b 
Difference  Image 


Figure  3a 
Original  Image 


IV  SHADOWS 


An  understanding  of  shadows  is  crucial  to 
making  sense  out  of  high-resolutlon  aerial  Images 
The  scene  Is  always  out-of-doors,  and  Is  usually 
Illuminated  by  direct  sunlight  because 
photoreconnalssance  missions  are  flown  mainly  In 
clear  weather  during  the  day.  Sunlight  produces 
deep,  dark  shadows.  Frequently  shadows  are  the 
most  prominent  visual  feature  of  an  Image. 


Figure  4 Vehicle  with  Shadow 


For  vehicle  classification,  many  of  the 
anomalies  the  classifier  is  called  on  to  consider 
sre  only  the  shadows  of  objects  off  the  road,  such 
as  trees,  signs,  or  utility  poles.  All  vehicles 
cast  shadows,  and,  unless  the  boundary  between  the 
vehlole  and  Its  shadow  can  be  determined, 
classification  on  the  basis  of  shape  Is  hopeless. 
Furthermore  the  existence  or  nonexistence  of  a 
shadow  can  aid  in  deoldlng  whether  or  not  a given 
anomaly  Is  a vehicle.  The  size  and  shape  of  the 
shadow  can  give  valuable  clues  of  the  height  of  the 
vehicle  and  its  profile.  As  a dramatic 
demonstration  of  this,  consider  the  vehicle  shown 
In  Figure  4.  Because  the  reflectance  of  the 
vehicle  Is  almost  the  same  as  that  of  the  road,  the 
vehicle  might  have  gone  un-notlced  were  It  not  for 
the  shadow.  But  the  shadow  not  only  gives  away  Its 
position,  It  tells  us  the  vehicle  is  probably  a 
Volkswagen  ■beetle." 


Figure  5a 
Original  Image 


Figure  5b 
Difference  Image 


We  have  a number  of  techniques  at  our  disposal 
for  Identifying  shadows.  The  simplest  Is  based  on 
the  brightness  model.  The  technique  is  simply  to 
search  for  all  pixels  in  the  image  whose  Intensity 
is  In  the  range  of  values  expected  for  shadows. 
This  works  somewhat  better  In  the  difference  Image 
than  In  the  original,  because  the  effects  of 
variation  In  the  road  surface  are  reduced.  Figure 
5 shows  the  central  portion  of  the  area  analyzed 
In  Figure  2,  whloh  we  will  use  to  Illustrate 
shadow-finding  techniques.  Figure  6 shows  the 
shadows  extracted  from  Figure  5b  by  this  method. 


In  our  work  so  far,  the  expected  range  of 
shadow  intensities  has  been  Inferred  from  the 
statistics  of  areas  manually  Indicated  as  shadows 


Figure  6 Shadows  found  by  Brightness  Criterion 
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Another  device,  based  upon  a predictive  model, 
depends  on  knowing  the  sun  angle.  The  shadow  of 
any  raised  object  is  always  on  the  side  away  from 
the  sun,  and  if  the  height  of  the  object  is  known, 
the  width  of  the  shadow  can  be  predicted.  Figure 
7b  shows  the  areas  identified  as  shadow  from  the 
image  of  Figure  5b  by  thresholding  the  difference 
image  to  locate  anomalies,  and  assuming  each 
anomaly  to  be  due  solely  to  an  object  five  feet 
tall  plus  its  shadow. 


Figure  7 Shadows  Found  by  Predictive  Criterion 


The  third  technique  is  based  on  a protective 
mode  1 . It  tries  to  look  directly  for  the  shadow 
edge.  Vehicles  tend  to  be  rectangular  when  viewed 
from  above,  and,  unless  the  sun  is  directly  ahead 
of  or  behind  the  vehicle,  there  will  be  a long, 
straight  edge  separating  the  vehicle  from  its 
shadow.  This  edge  can  usually  be  found  by 
performing  a Hough  transform  [ 4 ] on  the  gradient 
of  the  image,  or  (equivalently)  by  projecting  the 
gradient  onto  axes  oriented  in  various  directions 
and  finding  the  direction  from  which  the  gradient 
points  tend  most  to  reinforce  each  other.  However, 
much  better  results  are  obtainable  when  the 
direction  of  the  edge  is  known  or  assumed  a priori. 
Such  is  usually  the  case,  for  vehicles  tend  to  be 
oriented  parallel  to  the  road  direction. 

An  example  of  shadow  detection  by  projection 
is  presented  in  the  next  section. 

The  three  techniques  are  based  on  different 
sets  of  assumptions  and  are  applicable  in  different 
circumstances.  The  projective  method  is  useful 
only  for  finding  shadows  of  vehicles.  The 
predictive  model  la  more  generally  useful,  being 
applicable  to  objects  off  the  road  as  well  as  on 
it.  The  brightness  model  makes  no  assumptions 
about  the  object  casting  the  shadow--it  only 
requires  that  the  background  on  which  the  shadow  is 
cast  be  relatively  uniform. 


V CLASSIFICATION  OF  ANOMALIES 


For  classifying  anomalies,  we  have  chosen  to 
construct  a number  of  "expert"  subroutines,  each  of 
which  tests  a specific  hypothesis.  For  example, 
the  vehicle  expert  determines  whether  or  not  a 


given  anomaly  could  be  a vehicle  (plus  its  shadow), 
and  if  so,  attempts  to  say  whether  the  vehicle  is  a 
car  or  a truck.  The  tree-shadow  expert  tries  to 
say  whether  or  not  the  anomaly  could  be  the  shadow 
of  an  object  off  the  road,  and  the  road-marking 
expert  similarly  looks  for  painted  markings.  Other 
expert  modules  could  easily  be  Integrated  into  the 
scheme.  The  experts  operate  In  parallel,  each 
expert  forming  its  decision  without  Interacting 
with  the  other  experts.  The  top  level  program 
chooses  the  most  likely  interpretation  of  the 
anomaly.  If  no  expert  subroutine  is  able  to 
account  for  the  anomaly,  it  is  labelled 
"unclassified." 

The  vehicle  expert  Is  the  most  Involved  of  the 
expert  subroutines.  It  first  examines  the  overall 
size  (area)  of  an  anomaly.  If  the  anomaly  Is  too 
small  or  too  large,  it  is  rejected.  Next,  a search 
is  made  for  long  edges  that  might  be  the  sides  of 
the  car,  by  projecting  the  gradient  image  to  a 
baseline.  For  the  projection  a binary  mask  is  used 
so  that  only  those  points  near  the  anomaly  are 
considered — the  mask  is  generated  by  expanding 
("growing")  the  anomaly  three  pixels.  Figure  8a 
shows  the  results  of  applying  a gradient  operator 
to  the  image  of  Figure  5a.  The  masked  gradient  was 
projected  on  the  axis  drawn  in  Figure  8b,  where 
the  average  projected  gradient  magnitude  is 
plotted. 


Figure  8 Use  of  Projection  to  Find  Shadow  Edges 


A line  perpendicular  to  the  direction  of  the 
road  is  used  as  an  initial  baseline.  If  some 
evidence  of  edges  is  found,  small  perturbations  are 
made  to  the  orientation  to  find  a local  maximum. 

If  the  edges  are  not  found,  a global  search  is  made 
for  a direction  of  projection  that  will  show  the 
edges.  If  the  edges  are  not  found  again,  the 
anomaly  is  rejected. 

Note  that  there  are  three  peaks  in  the  plot, 
corresponding  to  the  boundaries  between  road  and 
car,  between  car  and  shadow,  and  between  shadow  and 
road.  The  three  highest  peaks  in  the  projected  grad- 
ient are  examined  to  see  if  they  are  In  the  correct 
relationship.  Average  brightness  is  projected  to 
the  same  baseline  to  see  if  the  brightness  of  the 
shadow  portion  is  appropriate.  A figure  of  merit  is 
computed  from  these  tests,  indicating  the  closeness 
of  measured  spacing  and  brightness  to  the  expected 
spacing  and  brightness.  The  figure  of  merit  is  used 
later  in  choosing  the  most  likely  interpretation  of 
the  anomaly. 
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The  average  width  of  the  shadow  and  the 
location  of  the  sun  may  be  used  to  estimate  the 
height  of  the  vehicle.  A tolerance  or  range  of 
uncertainty  Is  also  computed  at  this  time,  because 
the  combination  of  low  spatial  resolution  and  a 
disadvantageous  sun  angle  may  make  the  height 
figure  not  particularly  useful.  A nominal  height 
of  6 feet  Is  used  for  predicting  a shadow  to  the 
front  or  the  rear  of  the  vehicle,  and  this 
predicted  shadow  length  subtracted  from  the  length 
of  the  original  anomaly  gives  the  length  of  the 
vehicle. 

Classification  as  to  vehicle  type  is 
relatively  crude  at  this  time.  If  the  overall 
;ngth  of  the  vehicle  Is  greater  than  20  feet,  or 
»'  the  height  can  reliably  be  stated  to  be  greater 
than  6 feet,  the  vehicle  Is  called  a "truck". 
Otherwise  It  is  called  a "car." 

Another  expert  subroutine  identifies  shadows 
of  objects  off  the  road.  To  qualify  as  such  a 
shadow,  an  anomaly  must  have  an  average  brightness 
lower  than  the  average  road  brightness,  and  extend 
to  the  edge  of  the  road  on  the  side  nearer  the  sun. 
A figure  of  merit  Is  calculated  from  how  well  the 
average  brightness  (In  the  difference  image) 
corresponds  to  the  predicted  value,  and  from  the 
variance  of  brightness  inside  the  anomaly. 

The  expert  on  painted  markings  on  the  road  is 
similar  to  the  shadow  expert.  Painted  markings  are 
always  brighter  than  the  road  surface  and  limited 
in  total  area.  The  figure  of  merit  Is  based  only 
on  variance  of  brightness;  a much  lower  variance  Is 
expected  for  road  markings  than  for  shadows. 


VI  DISCUSSION 

The  state  of  our  experiments  In  anomaly 
classification  is  such  that  It  is  too  early  to 
report  any  quantitative  results.  However  we  can 
say,  qualitatively  at  least,  that  the  methods 
outlined  above  succeed  on  the  easy  cases  and  break 
down  on  the  difficult  ones.  He  have  tested  our 
programs  on  approximately  20  different  scenes 
extracted  from  three  diverse  road  areas.  Where 
good  contrast  exists  between  an  anomaly  and  the 
road,  and  (In  the  case  of  vehicles)  the  shadow  is 
visually  distinct  from  the  object  casting  It,  we  1 

have  little  difficulty  In  obtaining  a correct 
Identification.  Where  conditions  are  not  as  good, 
the  programs  tend  to  rail  to  make  any 
Identification  rather  than  to  come  up  with  a 
miaclasslfleatlon.  Additional  robustness  In  the  2 

classifier  will  be  neoesssry  to  enable  It  to  handle 
unusual  cases. 

The  various  expert  subroutines  are  not  now 
integrated  In  any  way.  Each  reports  its  figure  of  3 
merit  to  the  top-level  program,  which  selects  among 
the  hypotheses.  A more  useful  system  should  allow 
Interaction  among  the  various  experts. 

Figure  3 shows  a good  example  of  a case  that  M 
could  be  handled  by  cooperation  of  the  tree-shadow 
and  the  vehicle  experts.  It  might  be  sufficient  if 
the  shadow  expert  were  to  realise  that  it  could 


interpret  part  of  the  anomaly,  subtract  the 
explainable  part,  and  ask  the  other  experts  to 
classify  what  remains.  The  vehicle  expert  would 
have  to  take  the  situation  Into  account  and  not 
look  for  a separate  shadow  for  this  anomaly. 

Figure  9 is  difficult  to  analyze  without 
higher-level  knowledge.  A more  direct  link  to  the 
data  base  would  be  particularly  useful  in  this 
case,  enabling  us  to  separate  the  anomaly  Into 
portions  that  are  "expected"  (the  visible  portions 
of  the  arrow)  and  "not  expected"  (the  car  and  its 
shadow) . 

Much  generic  knowledge  tends  to  be  expressed 
in  the  coding  of  the  computer  programs  that  analyze 
pictures.  In  this  form  it  is  Inflexible — adding 
new  knowledge  Involves  writing  new  computer 
programs.  A long-range  goal  of  this  research  Is  to 
find  new  ways  of  expressing  this  kind  of 
information,  for  example  In  the  form  of  rules  or 
templates.  Such  a capability  would  lead  to  highly 
competent  computer  visual  capabilities  that  would 
greatly  enhance  interactive  and  automatic 
cartograhpy  and  photo-interpretation. 


Figure  9 A Vehicle  over  a Road  Marking 
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Abstract 

Wo  describe  an  algorithm  far  Unking  straight  edge 
elements  produced  from  a bottom  up  Une  finding  stage.  Edges 
are  linked  by  a best  first  search  algorithm.  Heuristics  are  used 
both  to  select  candidate  edges  for  Unking  and  to  prune  the 
search  tree.  A list  of  chains  of  edges  is  the  result  of  this  stage  of 
processing.  The  choice  of  heuristics  determines  the  chains  of 
edges  produced.  Further  heuristics  prune  the  list  of  chains. 
Finally  regions,  described  as  ribbons,  are  chosen  so  that  their 
boundaries  are  approximated  by  the  chains  of  edges.  Individual 
picture  elements  may  appear  In  multiple  ribbons  Further 
heuristics  can  be  employed  to  reduce  this  multiplicity.  We 
describe  the  algorithm  and  Its  implementation.  It  Is  organized  to 
that  a higher  level  computational  procedure  can  easily  direct 
this  low  level  algorithm  by  supplying  and  altering  the  heuristics 
during  processing,  a selection  of  useful  heuristics  is  described 
and  then  shown  working  on  an  example  picture,  with  a hand 
simulated  control  program,  resulting  in  extremely  useful 
descriptions  of  the  Image 


Introduction 

A number  of  people  have  used  goal-directed  heuristic 
search  methods  for  low  level  vision.  These  algorithms  have 
primarily  dealt  with  pixel  date  The  heuristics  used  have  been 
either  fixed,  or  dependent  on  a few  parameters  Some  of  these 
parameters  are  altered  depending  on  Image  quality,  and  others 
depending  on  the  contour  being  sought 

Many  of  those  using  these  approaches  have  tried  to  locate 
ribs  and  tumors  in  radiographs  of  human  chest  cavities.  For 
instance,  Ashkar  and  Modestino  [1978]  use  a standard  best  first 
tree  search  algorithm  (they  call  this  the  *Zigangtrov-Jeiintk  stack 
algorithm*)  to  follow  contours  The  branches  of  the  tree  arise 
from  examining  the  pixels  neighboring  the  node  pixel  which 
are  not  already  members  of  the  path.  The  metric,  or  evaluation 
function  used,  measures  the  continuity  of  the  contour, 
smoothness  of  curvature  and  a comparison  to  a prototype 
contour.  The  weighting  of  these  three  factors  b altered 
depending  on  the  noise  In  the  Image  • relying  leu  on  the 
prototype  for  cleaner  pictures  Since  their  example  domain  b 
finding  human  ribs  In  x-ray  Images  they  can  hopefully  be 
confident  of  an  Image  uncluttered  with  unexpected  objects  This 
method  relies  on  the  existence  of  cither  good  quality 
uninterrupted  contours  or  good  predictions  of  the  contour. 

Ballard  and  Sklansky  [1978]  look  for  turnon,  and 
Wee  haler  and  Sklansky  [1977]  tor  ribs  In  both  cats*  they  obtain 


an  approximate  area  for  the  object  of  search,  then  use  a 
modified  depth-first  search  method  to  follow  contours  The  best 
successor  pixel  Is  taken  at  each  stage  of  the  search.  Both  these 
systems  rely  on  powerful  techniques  heavily  dependent  on  the 
application  to  constrain  the  search. 

Martelll  [1972]  formulates  the  problem  of  edge  following 
as  a shortest  path  graph  search  and  uses  the  traditional  A* 
algorithm  to  carry  It  out  In  Martelll  [1976]  he  shows  how  e 
priori  knowledge  of  the  shape  of  the  contour  con  be  embedded 
In  the  cost  function  to  constrain  the  search. 

In  this  paper  we  will  describe  a system  which  uses 
heuristic  search  to  link  edge  elements  (rather  than  pixeb)  Into 
contours,  and  further  heuristics  to  evaluate  the  worth  of 
contours  so  produced.  The  system  is  quite  general,  and  relies  on 
the  heuristics  supplied  to  constrain  the  problem.  It  b Intended 
that  the  system  be  controlled  by  other  sections  of  a general 
purpose  vision  system  (the  ACRONYM  model-based  system,  tee 
Brooks,  Greiner  and  Blnford  (1974*1  [1978b]).  Thus  It  needs  to 
be  both  powerful  and  easy  to  program. 

We  deal  with  unregistered  Images  of  cluttered  scenes. 
Simple  bottom  up  procedures  are  not  adequate  to  deal  with  such 
a situation.  In  a typical  example  the  algorithms  are  describe  here 
will  be  invoked  many  times,  by  a higher  level  program,  which 
will  change  the  heuristics  and  associated  parameters,  depending 
on  Its  current  belief  of  the  scene  contents,  based  on  the  previous 
results  from  the  edge  linking  routines 

The  edges  we  use  as  data  come  from  segment  fibs 
produced  by  Nevatla  and  Babu  [19781  They  use  directional 
masks  to  detect  edges  which  they  thin,  threshold  and  fit  with 
piecewise  linear  segments  The  linear  pieces  are  directed  such 
that  their  right  sides  are  brighter  than  their  left  in  the  original 
image.  Further  they  are  often  linked  into  longer  consistent  edges, 
which  are  called  super  segments  We  will  refer  to  Individual 
linear  pieces  as  edge  elements  or  segments  Figure  2 shows  edge 
elements  produced  by  Nevada  and  Babu  from  the  photo  In 
Figure  I. 


The  Edge  Mapping  Algorithms 

In  this  section  we  will  outline  the  algorithms  we  have 
developed  for  extracting  region  descriptions  from  edp  data. 
These  algorithms  are  goal-directed  and  need  a higher  level 
system  to  supply  them  with  heuristics  We  will  refer  so  the 
higher  level  system  as  the  executive.  The  algorithms  we  describe 
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her*  art  *11  implemented  and  running  In  MACLISP.  They  form 
the  prototype  edge  mapping  module  (we  wilt  occasionally  refer 
to  It  as  EM)  of  the  ACRONYM  model-based  vision  system  (see 
Brooks,  Greiner  and  Binford  (1978a,  1978b]  and  B inford  and 
Brooks  [1979]).  The  EM  will  have  Its  heuristics  supplied  by, 
and  be  directed  by  the  predictor  and  planner  module  o' 
ACRONYM,  via  the  matcher.  We  defer  the  discussion  oi 
particular  heuristics  which  might  be  used  until  the  next  section. 
Often  the  heuristics  mentioned  below  are  described  as 
predicates.  It  should  be  noted  that  they  can  be  paramsteriied 
via  global  variable*. 

There  are  four  relatively  Independent  phases  of  the  algorithm; 
linking  candidate  contours,  discarding  unsatisfactory  candidates, 
finding  ribbon  descriptions  for  contours  and  the  regions  they 
bound  and  finally  disambiguating  redundant  descriptions  of  a 
single  part  of  the  two  dimensional  Image.  We  discuss  these  four 
phases  in  order  below.  In  general  we  expect  that  for  a given 
goal,  the  first  three  of  these  phases  would  proceed  in  a serial 
manner,  although  the  results  from  a single  invocation  of  the 
phase  of  Unking  candidate  contours  may  be  used  by  multiple 
invocations  of  the  next  tarn  phases,  with  varying  goals.  The  last 
phase  may  occur  as  pan  of  the  actual  matching  process;  we  will 
however  demonstrate  a useful  heuristic,  in  the  next  section,  for 
us*  In  a bottom  up  situation. 


Fig.  I.  Digitised  photograph. 


Unking  CandUUlt  Contours 

Linking  edges  into  a contour  can  easily  be  formulated  as  a 
tree  searching  problem.  Each  node  is  an  ordered  list  of  edge 
elements,  with  a score  attached  giving  a measure  of  goodness  of 
the  contour  specified  by  the  list  There  is  alto  an  optional 
direction.  Recall  that  the  edge  elements  are  directed  so  that  their 
right  sides  are  brighter  than  their  left.  A direction  of  ’right*  will 
be  interpreted  to  mean  that  the  edge  Hu  Is  following  around  a 
relatively  brighter  region,  and  ’left*  a darker  region.  The  root 
node  of  the  tree  Is  a list  of  a single  edge  element  A descendant 
node  as  of  a nod*  n with  edge  list  (enl/ny_^nk),  will  have  edge 
Hsc  *-*-  each  descendant  nod*  extends  the 

contour  by  on*  new  edge  dement  If  a nod*  has  a direction  then 
Its  descendant  nodes  have  the  same  direction.  In  Us  most  general 
form,  each  nod*  has  a descendant  for  every  edge  element  In  the 


image  which  Is  not  already  on  its  edge  list  To  find  the  best 
contour  starting  with  a given  edge  element  we  must  search  the 
tree  to  find  the  the  node  with  the  best  score 

There  are  four  Issues  to  resolve  for  such  a tree  search. 

1.  Which  edges  should  be  chosen  as  root  node*  of  search  trees? 

2.  The  tree  as  defined  will  clearly  be  enormous  (of  site  (n-l)l 
where  n is  typically  }00  to  1000).  Therefore  the  tree  must  be 
pruned,  and  not  all  nodes  searched.  How  should  this  be  done? 

8.  What  scoring  function  should  be  used? 

4.  In  what  order  should  the  tree  be  searched  and  when  should 
the  search  be  terminated? 

The  solutions  to  these  four  problems  incorporate  a capability  for 
the  executive  to  direct  the  contour  finding  process 

To  choose  root  nodes  for  the  search  trees,  the  executive 
supplies  a list  of  candidate  egde  elements  and  a list  (called 
root-lists , perhaps  empty)  of  predicates  on  edge  elements  An 
edge  which  satisfies  any  of  the  root-lists  has  Its  search  tree 
expanded.  Note  that  there  Is  some  redundancy  In  the  two 
mechanisms.  This  is  to  provide  efficiency.  The  predicate  Hu 
provides  a uniform  selection  procedure,  and  there  is  a single 
instance  of  the  control  structure  within  the  low  level  edge 
mapping  module.  In  many  cases  the  executive  will  simply 
provide  a list  of  the  edge  elements  to  be  pruned  by  the  root-tists. 
However  suppose  the  executive  Is  only  interested  in  a restricted 
area  of  the  total  image  and  the  edge  elements  are  hashed  on 
their  two  dimensional  location.  It  will  then  be  more  efficient  for 
special  purpose  routines  within  the  executive  to  provide  a list  of 
location  specific  edges,  rather  than  use  a predicate  within 
root-tists,  as  the  latter  can  not  benefit  from  the  having  the  edges 
hashed  - it  must  test  each  edge  in  turn. 

We  use  a best  first  search  algorithm  to  search  partially  a 
sub-tree  of  the  total  search  tree.  The  sub-tree  which  may 
actually  be  searched  is  defined  by  two  Itsu  of  heuristic* 
provided  by  the  executive;  the  producirs  and  the  reapers.  At  we 
will  see  below  the  ordering  of  producirs  is  important  for  the 
order  in  which  the  search  tree  It  traversed,  and  so  in  the  case  of 
Incomplete  search  (the  normal  case),  this  ordering  can  effect  the 
search  outcome.  The  ordering  of  rtopirs  can  effect  the  efficiency 
of  the  the  search  but  not  the  outcome.  The  depth  first  search, 
combined  with  the  scoring  function  determines  the  search  order. 
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A oest  first  tree  March  algorithm  work*  as  follows  A list 
of  nodes  Is  kept  In  decreasing  order  of  their  score.  Initially  the 
list  contains  only  the  root  node,  which  Is  assigned  score  sera.  A 
node  Is  said  to  be  fully  developed  If  all  Its  descendants  In  the 
sub-tree  defined  by  the  producers  and  reapers  have  already  been 
placed  on  the  the  ordered  node  list  A node  Is  partially 
developed  if  It  Is  not  fully  developed.  Best  first  March  finds  the 
first  partially  developed  node  on  the  ordered  list  (le.  the 
partially  developed  node  with  the  highest  score),  calculates 
another  descendant  and  Its  score,  and  InMrts  the  new  node  Into 
the  list.  This  continues  until  there  are  n .nodes  (a  parameter 
supplied  by  the  executive)  fully  developed  at  the  head  of  the 
list.  The  result  of  the  search  Is  the  edge  list  associated  with  the 
node  at  the  head  of  the  list,  which  will  be  the  highest  scoring 
node  found  during  the  search.  Note  that  a node  can  never  be 
promoted  on  the  ordered  list.  Further,  the  search  for  a node  to 
develop  will  never  proceed  past  the  first  a _ned«s,  for  then  there 
would  have  to  n_nodes  fully  developed,  and  so  the  tree  search 
would  terminate.  Thus  only  the  first  n.ntdes  of  the  ordered  list 
need  be  retained. 

The  producers  are  functions  of  an  edge  element  and  a 
direction,  used  to  produce  descendants  of  a node.  They  are 
applied  to  the  most  recent  edge  of  a node,  and  the  node’s 
direction.  A producer  returns  a list  of  descendants,  and  a score 
for  each  of  rhoM  edges.  When  the  March  procedure  wants  to 
develop  a node,  by  finding  another  descendant.  It  checks  to  see 
if  there  are  any  edges  left  from  the  last  producer  used.  If  not  It 
calls  the  next  producer.  It  continues  until  It  has  an  edge  element 
which  Is  neither  already  in  the  contour,  nor  has  been  previously 
suggested  by  a producer.  It  also  demands  that  if  the  parent  node 
had  an  associated  direction,  the  exterior  angle  made  by  the  new 
edge  element  is  within  slop  degrees  of  continuing  in  that 
direction  ( slop  Is  a parameter  supplied  by  the  executive).  Finally 
the  search  procedure  tests  the  proposed  new  edge  element  with 
the  predicates  In  the  list  reapers.  TheM  are  predicates  of  edge 
list  from  the  parent  node,  the  proposed  new  edge  and  the 
direction  of  the  parent  node.  When  the  March  procedure  finally 
has  a new  edge  It  makes  up  a new  node,  with  that  edge  added 
to  the  contour,  and  a score  from  the  parent  node,  Incremented 
by  the  score  associated  with  the  new  edge. 

Thus  the  decisions  about  which  edges  to  link  are  mostly 
made  on  the  bails  of  very  local  Information,  namely  the 
previous  edge  linked,  and  the  general  rotational  direction  of  the 
contour.  For  small  values  of  slop  the  contours  tend  to  encompass 
almost  convex  regions. 

Discarding  Contours 

Since  the  contours  are  found  using  only  very  local 
properties  of  the  edge  elements,  It  Is  necessary  to  make  more 
global  checks  before  proceeding  to  fit  ribbon  descriptions.  The 
EM  uses  an  executive  supplied  list  of  predicates,  nillers,  which 
each  take  two  arguments;  a contour  and  the  direction  used  In 
finding  that  contour.  A candidate  contour  Is  retained  only  If  it 
and  as  aasorlated  direction  satisfy  all  (he  predicates  In  cullers. 
Nose  that  these  predicates  too  may  be  parameterlted  by  global 
• •main  sat  by  the  executive.  Also  the  ordering  of  cullers 
•eteeem  effect  the  final  smcami.  but  X can  effect  efficiency. 


which  are  the  two 
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describe  the  area  defined  by  a contour  as  a ribbon.  We 
currently  use  a subset  of  the  general  definition  of  the  class  of  all 
ribbons.  In  particular,  they  must  be  defined  by  sweeping  a 
symmetric  width  element  normally  along  a straight  spine  while 
changing  the  width  linearly  with  distance  swept  We  will  call  the 
two  lines  defined  by  the  width  element  at  it  Is  swept  along  the 
spine,  ribbon  edges.  Thus  the  ribbons  we  um  can  be  fully 
specified  by  a line  segment  (the  spine)  and  a width  at  each  end. 

Fitting  a ribbon  to  a contour  Is  currently  done  by  a fixed 
algorithm  with  no  mechanism  for  goal  direction  from  the 
executive.  It  proceeds  as  follows.  A histogram  of  the  angles  of 
the  edge  elenwnts  making  up  the  contour  (weighted  for  edge 
length)  is  constructed  at  20  degree  Intervals.  The  two  peaks  with 
the  largest  areas,  between  local  minima,  are  found,  and  the  edge 
elements  which  contribute  to  each  are  Identified.  The  two  edges 
of  the  proposed  ribbon  description  are  fitted  to  theM  collections 
of  edge  elements.  First  the  mean  angles  (weighted  for  edge 
element  length)  are  calculated,  and  then  straight  lines, 
constrained  by  theM  angles  are  fitted  by  least  squares,  weighted 
by  length  once  again.  The  line  whoM  angle  Is  the  mean  of  the 
ribbon  edge  angles,  and  which  is  equidistant  from  the  ribbon 
edges,  Is  calculated.  The  edge  elements  which  defined  the  ribbon 
edges,  are  normally  projected  onto  the  center  line,  to  define  the 
extremities  of  the  spine.  The  width  function  for  the  ribbon  can 
then  be  easily  calculated  from  the  spine  and  the  ribbon  edges. 

Disambiguating  Ribbons 

Often  there  will  be  multiple  ribbon  descriptions 
constructed  for  esMntially  the  same  region  of  an  image.  The  EM 
provides  no  explicit  mechanism  to  apply  heuristics  to  this 
problem.  The  disambiguation  problem  requires  comparisons 
amongst  ribbons,  and  Is  thus  essentially  an  n2  process,  for  n 
ribbons.  For  Individual  heuristics,  large  savings  In  the  constant 
of  proportionality  can  be  made,  by  saving  partial  results 
between  the  n2  comparisons  (eg.  In  the  example  given  later,  It  Is 
prudent  to  precalculate  all  the  centers  of  gravity).  We  have  not 
found  a convenient  mechanism  to  handle  the  most  general  caM 
efficiently,  thus  we  have  left  the  problem  to  special  purpose 
programming  by  the  executive. 


Some  Examples 

In  this  section  we  will  discuss  some  heuristics  which  can 
be  used  to  direct  the  EM  towards  prescribed  goals.  It  should  be 
noted  that  the  goal  direction  Is  not  explicit,  but  rather  Implicit 
In  the  choice  of  heuristics  and  their  parameters.  We  will  um  as 
examples  here,  the  Image  of  an  LIOII  airplane  on  the  ground  at 
San  Francisco  airport  shown  In  fig.  I,  working  with  the  edge 
elements  shown  in  fig.  2,  and  edge  elements  derived  from  an 
aerial  photograph  of  an  airfield,  shown  In  fig.  1 We  will 
consider  the  pictures  to  be  I unit  square  in  the  following 
d Iscussion. 

To  Increase  the  efficiency  of  the  heuristics  vie  hash  the 
edge  elements  on  their  angles  at  I degree  intervals,  and  on  a list 
of  their  angles  at  20  degree  intervals  and  the  x and  y 
coordinates  of  the  edge  beginning,  to  a resolution  of  one 
sixteenth  of  the  Image.  Only  edges  which  are  at  least  five  pixels 
long  In  the  original  Image  (M2  by  512  pixels)  are  hashed  This 
restricts  the  number  of  edge  elements  w hich  might  be  considered 
at  all  to  477  for  the  airplane  scene  and  170  for  the  airfield 
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decreasing  order  according  to  their  icores. 

The  r tapers  we  have  used  have  all  been  concerned  with 
the  convexity  of  the  contour  so  far  generated.  They  check 
whether  the  proposed  new  edge  element  follows  a previously 
established  direction  for  the  contour,  and  whether  there  It  a 
possible  path  from  the  end  of  the  new  edge  element  to  the  start 
of  the  contour,  which  keeps  the  region  so  defined  convex.  All 
these  tests  are  loose,  In  the  sense  that  concavities  caused  by 
edges  within  parameter  tlep  of  being  convex,  are  Ignored. 

Figure  4.  shows  ribbons  fitted  to  IM  of  the  232  contours 
produced  by  using  the  three  producers  nsstn,  continuation  and 
gel^nearby,  in  that  order,  and  the  cullers  described  above  (the 
ribbon  fitter  failed  on  the  remaining  33  contours,  as  they  had 
only  a single  peak  when  hlstogrammed  on  angles  of  the  edge 
elements).  The  parameter  n^nedes  was  set  at  value  2 for  this 
and  all  experiments  described  here.  Thus  only  two  nodes  of  the 
search  tree  were  kept  in  the  ordered  list,  and  the  search 
terminated  after  completely  developing:  two  nodes.  The  effect  of 
this  Is  that  for  most  nodes,  the  first  edge  suggested  by  a 
heuristic  in  the  producers  list.  Is  the  one  which  was  used  in  the 
final  solution.  Thus  super  segments  tend  to  get  followed,  unless 
they  violate  the  direction  of  the  contour  already  established 
The  second  choice  is  for  continuations  of  edges  to  be  followed 
wherever  possible. 


The  only  root-utt  we  used  In  these  experiments  passed 
any  edge  element  whose  length  was  greater  than  some  threshold, 
or  which  had  a successor  In  some  super  segment.  Using  a 
threshold  of  0.03  and  considering  all  the  edge  elements,  this 
results  in  232  search  trees  for  the  airplane  scene,  and  136  for  the 
airfield  scene  being  expanded. 


Fig*.  Airfield  edge  elements 


We  have  so  far  experimented  with  three  producer 
heuristics.  Recall  that  a producer  Is  supplied  with  the  most 
recently  linked  edge  element  and  possibly  a direction  for  the 
contour  being  linked.  The  simplest  heuristic  we  hive  used, 
called  nssm  simply  checks  whether  there  is  a super  ttgment 
which  contains  the  previous  edge  element,  and  continues  past  It. 
If  so.  that  succeeding  edge  Is  returned,  with  a score  of  two  The 
continuation  heuristic  tries  to  find  an  edge  element  which  lies  on 
the  linear  extension  of  the  previous  edge  For  example  If  it  Is 
given  the  edge  defining  Ihe  fuselage  forward  of  the  starboard 
wing  (see  fig.  2)  the  continuation  heuristic  suggests  the  edge 
behind  the  wtng  Thus  continuation  is  useful  for  following 
straight  edges  which  are  broken  by  intrusions.  There  are  a 
number  of  parameters  which  can  be  adjusted-,  the  variation 
allowed  between  the  angles  of  the  two  edge  elements,  the 
distance  the  proposed  continuation  edge  can  start  from  the  first 
(normalized  for  the  length  of  the  first),  and  how  close  the 
beginning  of  the  new  segment  must  actually  be  to  the  straight 
line  which  continues  the  first  edge  element.  The  flru  two  of 
these  are  necessary  to  compensate  both  for  the  digital 
approximations  of  where  the  edges  are,  snd  for  errors 
introduced  by  the  line  finding  stage  where  good  edges  hive 
been  slightly  deflected  by  small  disturbances  at  their  ends, 
Incorrectly  linked  to  them.  The  tonttnuatlon  heuristic  uses  the 
hash  tables  described  above  Multiple  continuation  edges  which 
meet  all  the  above  requirements  may  be  found.  The  longest  one 
Is  returned,  with  a score  between  10  and  20,  dependent  on  Its 
length.  The  last  heuristic  we  have  used.  get.  nearby  looks  for 
edge  elements  which  start  near  the  end  of  the  last  edge  In  the 
contour  There  Is  a parameter  which  controls  ihe  range  of 
angles  allowable  for  these  edges  If  the  direction  of  the  contour 
U known,  then  the  search  Is  constrained  co  (he  appropriate  side 
of  Ihe  last  edge.  Candidates  are  scored  between  0.3  and  1.5, 
depending  on  their  length.  A Hat  of  candidates  Is  returned  in 


It  takas  about  thirty  seconds  CPU  time  on  a KL-10  to 
produce  all  the  contours  for  Figure  4.  and  approximate  them 
with  ribbons.  It  should  be  noted  that  the  current  implementation 
still  carries  a lot  of  developmental  baggage,  and  has  not  yet  been 
optimized  Clearly  there  are  too  many  ribbons  for  a matching 
phase  to  do  anything  useful.  We  have  used  five  culling 
heuristics  to  prune  them.  Setting  the  heuristic  parameters  gives 
control  over  the  types  of  contours  retained. 

The  simplest  culler  discards  contours  with  leu  than  a 
prescribed  number  of  edges.  In  the  experiments  described  here, 
we  demanded  at  least  three  edges  be  linked  together.  To  ensure 
that  the  contour  bounds  a region,  we  have  a culler  which 
calculates  the  exterior  angles  of  the  contour,  and  checks  that 
their  sum  meets  aome  threshold.  Here  we  have  used  a threshold 
of  160  degrees.  The  next  culler  estimates  the  area  of  the  region 
bounded  by  the  contour,  by  filling  In  gaps  between  succesilvt 
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edges  with  straight  lines,  and  finding  the  area  of  the  polygon. 
Again  this  is  thresholded.  The  final  two  culltri  rely  on 
histogranwning  the  edges  of  the  contour,  by  their  angles 
weighted  for  length.  The  first  histograms  modulo  ISO  degrees, 
finds  the  best  peak  and  checks  that  It  is  greater  than  some 
threshold.  This  heuristic  tends  to  catch  contours  which  don't 
really  bound  a well  defined  region,  but  have  managed  to  get 
past  the  exterior  angle  check.  For  instance,  a contour  made  up 
of  a wing  tip,  the  trailing  edge  of  the  wing,  the  forward  part  of 
the  opposite  side  of  the  fuselage,  and  the  nose  cone,  passes  the 
exterior  angle  check,  but  not  the  histogram  check.  Also  If  the 
threshold  is  greater  than  0.5  (as  in  all  the  experiments  described 
here)  It  tends  to  discard  ribbons  which  are  not  longer  than  they 
are  wide.  The  last  heuristic  histograms  weighted  angles  for  a 
full  960  degrees,  and  then  checks  the  angular  separation  of  the 
two  best  peaks.  This  can  be  used  to  ensure  parallel  sided 
ribbons.  Another  useful  heuristic  would  be  a length  to  width 
test.  We  have  not  used  that  in  these  experiments. 


Fig.  5.  Large  regions. 


rig.i  Lower  area  threshold. 

With  the  methods  we  have  described  one  often  gets 
multiple  ribbons  for  a single  region  of  the  actual  Image.  This 


occurs  because  many  edges  which  appear  In  a single  contour,  are 
picked  as  root  nodes  for  further  tree  searches,  and  so  produce 
contours  which  are  sublets  of  an  already  found  larger  contour. 
Nevatia  [1974]  also  had  this  problem  in  fining  generalised  cone 
descriptions  to  three  dimensional  laser  range  data.  He  discusses 
methods  for  choosing  which  description  to  use,  based  on  the 
amount  of  shared  boundary.  So  far  the  only  dlsambtguator  we 
have  used  is  based  on  area  comparison.  It  simply  compares  two 
ribbons  (rather  than  contours)  and  If  the  center  of  gravity  of 
each  falls  within  the  boundary  of  the  ocher,  then  the  ribbon 
corresponding  to  the  contour  with  lowest  score  attached  to  its 
search  node  is  discarded.  The  heuristic  described  here 
eliminates  most  of  those  types  of  multiple  description. 

Figure  5.  show  the  results  of  applying  the  above  culltri 
and  disambiguating  heuristic  to  the  contours  which  produced 
the  ribbons  in  figure  4.  The  cullers  were  parameteriied  to  look 
for  large  regions,  with  almost  parallel  main  edges.  This  wu 
done  by  setting  the  area  threshold  to  three  percent  of  the  total 
picture,  and  the  site  of  the  single  peak  heuristic  to  0.7,  and  by 
confining  the  distance  between  the  two  peaks  to  be  within  10 
degrees  of  180  degrees.  The  culltri  reduce  the  number  of 
contours  to  12,  and  the  disambiguator  to  the  final  6. 

Figure  6 shows  the  same  image  with  the  area  threshold 
lowered  to  half  of  one  percent  of  the  total  Image  area  The 
Inadequacy  of  the  area  based  disambiguator  can  be  seen  here, 
where  there  are  ribbons  of  largely  varying  areas.  Figure  7 shows 
the  airfield  from  figure  S processed  with  precisely  the  same 
parameters.  The  runways  are  picked  out  much  better  (see  figure 
8)  when  the  continuation  heuristic  has  Its  parameters  loosened, 
as  the  line  finder  introduces  large  errors  for  some  edges  which 
actually  correspond  to  a single  broken  straight  line.  A heuristic 
for  finding  anti-parallel  lines  to  link  would  alio  Improve 
performance. 


Fig.  7.  Airfield. 

Finally  we  give  an  example  where  finer  work  has  been 
done  looking  for  more  detail  in  the  airplane  picture.  We  atMjme 
that  the  executive  has  found  a tentative  match  for  an  airplane 
from  the  ribbons  displayed  In  figure  5,  and  wants  to  look  tor 
rear  stabilizers  to  confirm  the  match,  and  engine  pods  to 
Identify  the  airplane  type.  The  results  of  these  two  procedures  Is 
shown  In  figure  9 superimposed  on  the  ribbons  shown  In  figure 
5.  We  manually  played  the  role  of  the  executive  In  this  example. 
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Given  the  result*  of  figure  5,  the  EM  wu  let  going  once  again, 
but  the  edge*  lilt  of  candidate  root  node*  wu  restricted  to  thou 
starting  near  the  rear  of  the  fuselage.  On  the  bull  of  the  wing 
and  fuselage  areas,  new  ranges  sure  set  for  the  area  of 
acceptable  ribbons,  and  the  condition  on  the  parallelness  of  the 
main  edges  of  ribbons  relaxed  becauu  of  the  known  shape  of 
the  stabilizers  The  rest  of  the  heuristics  and  parameters  sure 
unchanged. 

For  the  engine  pods,  the  area  parameters  sure  changed 
once  more  and  the  continuation  heuristic  wu  removed  from  the 
producers  list.  The  search  wu  restricted  to  the  leading  edges  of 
the  two  wings  and  the  rear  of  the  (Uselage.  An  automatic 
executive  could  uu  the  results  of  such  a search  to  help  identify 
an  airplane  type.  For  this  particular  image,  removing  the 
continuation  heuristic  did  not  alter  the  results,  however  an 
automatic  executive  might  decide  that  the  engine  pods  are  small 
and  unlikely  to  have  their  edges  Interrupted  by  intrusions 
Removing  the  heuristic  reduces  the  possibility  of  making 
erroneous  links  to  edges  from  nearby  objects  such  u trucks  It 
may  seem  strange  that  the  small  truck  nev  to  the  starboard 
engine  pod  wu  not  found  in  the  engine  search.  It  wu 
considered  but  discarded  by  the  tingle  peak  heuristic  described 
earlier.  If  it  had  been  elongated  rather  than  square  (or  if  we 
were  looking  for  squarer  engine  pods)  it  would  have  been 
retained.  Higher  level  processing  In  the  executive  would  have 
then  been  responsible  for  deciding  how  to  fit  the  candidate 
engine  to  the  model. 


Fig.  8.  Airfield  with  looser  continuations 


Discussion 

Although  the  experiments  so  far  have  been  limited  to  a 
small  number  of  Images,  we  have  found  the  algorithms  and 
heuristics  to  be  fairly  stable  over  a wide  range  of  parameter 
values.  Often  variations  of  50  or  100  percent  of  a particular 
parameter  produce  only  minimal  changes  in  the  ribbons 
Identified.  It  thus  seems  promising  that  a reasoning  system 
which  is  to  select  heuristics  for  the  EM  will  be  able  to  use  fairly 
qualitative  reuoning  In  deciding  between  a small  number  of 
values  for  each  parameter.  Furthermore,  It  should  be  possible  to 
select  heuristics  which  will  be  valid  for  a range  of  images  for 
which  only  an  approximate  camera  model  (and  thus  ribbon  site) 


is  known. 

Already  the  system  seems  powerful  enough  for  special 
purpose  executives  to  be  written  to  identify  certain  classes  of 
scenes.  The  ACRONYM  project  has  more  general  goals 
however.  The  predictor  and  planner  will  be  a general  purpose 
executive,  'programmed*  by  the  generic  and  specific  models  of 
object*  it  is  trying  to  locate  In  Images.  The  predictor  and 
planner  will  thus  require  knowledge  of  how  volumetric  elements 
and  their  relations  will  effect  the  type  of  heuristics  which  should 
be  used  for  edge  linking  and  region  finding.  It  will  need  to 
know  how  to  translate  the  explicit  goals  it  can  deduce  from  the 
models,  into  a choice  of  the  correct  heuristics  and  parameters. 
Knowledge  of  this  tort  will  be  Incorporated  Into  the  rule  set 
which  it  uses  for  reasoning  (see  Blnford  and  Brooks  [I979D. 
First,  however,  we  will  need  to  experiment  further,  to  discover 
for  ourselves  how  to  better  predict  the  heuristics  to  use,  on  the 
basis  of  what  is  expected  in  an  image.  Some  of  the  reasoning 
which  will  be  required  was  given  during  the  explanation  of  the 
examples.  This  reasoning  will  have  to  be  mechanised.  Abo  we 
may  find  it  necessary  to  increase  the  number  of  heuristics,  from 
the  current  set  For  instance  an  anti-parallel  heuristic  should 
improve  the  performance  on  runway  and  roadway  type  scents 


Fig.  I.  An  LIOII. 

It  should  alto  be  noted  that  ACRONYM  includes  very 
powerful  methods  for  disambiguation  and  Identification  in  the 
matcher  - using  the  symbolic  descriptions  provided  by  the 
Observability  Graph.  Thus,  while  the  results  presented  in  this 
paper  seem  very  promising,  the  total  system  should  be  able  to 
perform  reasonably  with  an  edge  mapping  module  which 
doesn’t  meet  our  current  expectations 
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Abstract.  Shaded  overlays  for  maps  give  the  user  an 
immediate  appreciation  for  the  surface  topography  since 
they  appeal  to  an  important  visual  depth  cue.  A brief 
review  of  the  history  of  manual  methods  is  followed  by  a 
discussion  of  a number  of  methods  that  have  been 
proposed  for  the  automatic  generation  of  shaded  overlays. 
These  techniques  are  compared  using  the  reflectance  map 
as  a common  representation  for  the  dependence  of  tone 
or  gray  level  on  the  orientation  of  surface  elements. 


Figure  1:  Block  diagram  of  a system  for  the  generation 
of  relief  shading.  The  grey-value  is  calculated 
by  applying  the  reflectance  map  to  the 
gradient  estimate  obtained  by  sampling 
neighboring  points  in  the  digital  terrain  model. 


Introduction 

Of  several  ways  of  depicting  surface  form  on  maps, 
hill-shading  has  the  most  immediate  appeal  and  provides 
for  quick  comprehension  of  the  topography.  In  this 
sense,  hill-shading  is  complementary  to  the  use  of 
contours,  which  provide  accurate  terrain  elevations  but 
require  careful  scrutiny  if  one  is  to  ascertain  the  surface 
form.  Shaded  maps  are  most  important  when  the 
interpreter’s  time  is  limited,  as  in  aviation,  for  users  that 
are  nor  trained  cartographers  and  for  small  scale  maps, 
where  contours  degenerate  into  messy  tangles  of  lines. 

Why  then  do  we  not  see  more  shaded  maps?  One 
reason  is  the  expense  of  present  manual  methods  of 
production,  which  require  skilled  artists  with  good  insight 
into  cartography.  Working  from  existing  contour  maps, 
ridge  and  stream  lines  extracted  from  such  maps  and  at 
times  aided  also  by  aerial  photography,  they  wield 
airbrushes  in  what  is  a slow,  tedious  and  imprecise 
operation.  Attempts  at  automation  began  with  the 
notion  that  the  gray  levels  used  in  the  shading  should 
derive  from  * model  of  how  light  might  be  reflected  from 
a surface.  Ignoring  shadowing  and  mutual  illumination 
effects,  it  seems  clear  that  the  reflected  intensity  will  be 
a function  of  the  local  surface  inclination.  The  choice  of 


a method  for  calculating  the  gray  tone  based  on  the 
orientation  of  each  surface  element  has  however  been  the 
subject  of  occasionally  bitter  controversy  for  almost  two 
centuries.  Much  of  the  difficulty  stems  from  a lack  of  a 
common  representation  that  would  allow  comparison  of 
methods  which  appear  at  first  glance  to  be  incomparable. 

The  recently  developed  reflectance  map  constitutes 
such  a common  denominator.  It  is  a simple  device 
developed  originally  for  work  in  machine  vision  where 
one  is  interested  in  calculating  surface  shape  from  the 
gray  levels  in  an  image.  This  is  clearly  just  the  inverse 
of  the  problem  of  producing  shaded  pictures  from  a 
surface  model.  The  reflectance  map  is  a plot  of  apparent 
brightness  versus  two  variables,  namely  the  slope  of  the 
surface  element  in  the  West-to-East  direction  and  the 
slope  in  the  South-to-North  direction.  Producing  a 
shaded  overlay  for  a map  then  is  simply  a matter  of 
calculating  these  two  slopes  for  each  surface  element  and 
looking  up  the  appropriate  gray  level  in  the  reflectance 
map  (see  Fig.  1).  This  is  a very  simple,  local 
computation  that  can  be  carried  out  efficiently  even  on 
enormous  data-bases.  The  resulting  gray  levels  can  then 
be  fed  to  a graphic  output  device  that  will  produce  a 
continuous  tone  or  halftone  photographic  transparency 
from  the  given  stream  of  numbers. 


One  important  question  that  has  to  be  settled  is 
what  reflectance  map  is  to  be  used.  Careful  comparison 
of  more  than  a dozen  proposed  shading  methods  shows 
that  some  of  the  simplest  provide  a good  impression  of 
the  shape  of  the  surface.  These  experiments  also  show 
that  the  most  commonly  used  assumptions  about  surface 
reflectance  do  not  lead  to  the  best  results,  while  simple 
monotonic  functions  of  the  surface  slope  in  the  direction 
away  from  the  assumed  light  source  work  admirably. 
What  matters  is  the  visual  impression,  not  theoretical 
rules  [1].  One  goal  of  this  paper  is  a review  of  various 
hill-shading  methods  that  have  been  proposed  in  the  past 
in  terms  of  their  reflectance  maps. 

Early  History  Of  Hill-Shading 

Shading  has  been  used  in  hand-drawn  maps  for 
many  centuries.  Leonardo  da  Vinci  put  it  to  good  effect 
in  his  maps  of  Toscana,  drawn  in  1502  and  1503,  that 
contained  oblique  views  of  relief  forms  illuminated  from 
the  left  [1J.  Wood-cuts  of  the  area  around  Zurich  in 
Switzerland  drawn  half  a century  later  by  Murer  use 
shaded  side-views  as  well.  Overhead  views  using  relief 
shading  appear  for  the  first  time  in  maps  of  the  same 
area  drawn  a century  after  that  by  Gygers,  but  these 
then  gave  way  to  less  desirable  forms  11]- 

The  choice  of  the  representation  for  relief  forms 
depend  to  a great  extent  on  the  available  reproduction 
technology.  Woodcuts  and  engraving  methods  lend 
themselves  to  linear  forms,  where  brightness  of  an  area  in 
the  reproduction  is  controlled  by  the  spacing  and  width 
of  darkened  lines.  Useful  directional,  textural  effects  can 
be  generated  by  orienting  these  line  fragments,  or 
hachures  (Schraffuren),  along  lines  of  steepest  descent. 
Crowding  of  such  lines  in  steep  areas  may  have  given  rise 
to  notions  of  ’’steeper  implies  darker". 

Lehmann  proposed  the  first  rigorous  relationships 
12,3]  between  surface  slopes  and  quantities  measurable  on 
the  printed  map.  In  1799,  when  his  method 
(Bdschungsschraffen)  was  published  anonymously,  the 
techniques  for  measuring  the  surface  accurately  at  a large 
enough  nui  iber  of  points  did  not  exist.  Results  of  this 
first  method  of  illustrating  shape  are  in  some  ways 
analogous  to  those  one  might  obtain  by  illuminating  a 
model  of  the  surface  from  above,  an  arrangement  that 
gives  rise  to  images  that  are  not  easy  to  interpret. 

Partly  as  a result  of  this,  an  alternate  form 
(Schattenschraffen)  evolved  [4—6],  in  which  the  line 
thickness  is  varied  according  to  the  orientation  of  the 
local  surface  patch  with  respect  to  a light  source  usually 


assumed  to  be  near  the  top  left  of  the  map,  when  it  is 
oriented  properly  for  viewing.  For  maps  with  North  at 
the  top  this  corresponds  to  North-West.  Surface  patches 
sloping  down  in  that  direction  are  portrayed  with  a light 
tone,  while  those  sloping  up  in  that  direction  get  a dark 
tone.  Since  flat  areas  have  no  lines  of  descent,  they 
remain  white.  This  effect  is  unfortunate  because  it  is  the 
only  departure  from  what  would  be  seen  if  a diffusely 
reflecting  model  was  illuminated  obliquely.  This  makes 
such  maps  a little  difficult  to  interpret.  They  are 
nevertheless  superior  to  those  made  by  the  earlier 
method,  as  evidenced  for  example  by  the  "Dufourkarte" 
of  Switzerland  made  between  1842-1864  using  this 
approach  [1].  These  methods  for  portraying  surface  shape 
preceded  the  widespread  use  of  contours  [7],  in  part 
because  the  latter  require  detailed  surface  measurements 
that  were  not  available  before  the  advent  of 
photogrammetry. 

While  lithography  was  invented  by  Alois  Senefelder 
in  1796,  it  found  little  application  in  cartography  until 
around  1850.  It  permitted  the  production  of 

multicolored  maps,  but  more  importantly,  led  to  the  use 
of  halftones,  destined  to  ultimately  replace  lines  as  a 
means  of  modulating  the  average  reflectance  in  the 
printed  map.  W.  H.  Fox  Talbot  invented  a 

photomechanical  halftone  process  in  1852,  but 
commerical  success  came  only  years  after  the  patenting  of 
the  halftone  screen  by  Frederick  von  Fgloffstein  in  1865, 
and  the  crossline  screen  of  William  A.  Leggo  in  1869. 

Having  access  to  these  new  reproduction  schemes, 
Wiechel  [8]  developed  shading  methods 

(Schriiglichtschummcrung)  to  replace  the  use  of  hachures 
as  described  above.  His  fundamental  paper,  based  in  part 
on  work  by  Burinester  [9]  on  shaded  pictures  of  regular 
surfaces,  placed  the  field  of  hill-shading  on  a sound 
foundation.  Wiechel  discovered  the  error  regarding  flat 
surfaces,  for  example,  and  developed  n graphic  method 
for  determining  the  gray  value  from  contour  interval  and 
direction.  Unfortunately,  the  means  for  controlled 
generation  of  halftones  in  response  to  surface  orientation 
did  not  then  exist  and  his  work  was  ignored  for  a long 
time. 


Hill-Shading  In  This  Century 

Two  methods  based  on  lines,  this  time  contours 
instead  of  lines  of  steepest  descent,  were  explored  by 
Tanaka  in  the  1930's.  His  first  method  used  the  lines  of 
intersection  of  the  terrain  surface  with  uniformly  spaced, 
parallel,  inclined  planes  [10,11].  Tanaka’s  initiative  gave 
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rise  to  considerable  discussion  [12—19],  partly  in  the  form 
of  an  acrimonious  debate  [20—23].  His  other  method 
was  based  on  portrayal  of  a terraced  model  of  the  terrain 
[24—26],  an  approach  that  had  been  used  previously, 
unguided  by  his  careful  analysis  [27—30].  While  line- 
based  methods  give  rise  to  beautiful,  easy  to  interpret 
maps,  they  cannot  show  the  fine  detail  of  surface 
topography  possible  with  halftones  and  must  be  based  on 
smoothed,  generalized  information  such  as  contours. 
These  lines  also  tend  to  interfere  with  others  used  to 
portray  planimetric  information. 

A shaded  overlay  can  also  be  produced  by 
photographing  an  appropriately  illuminated  scaled  model 
of  the  surface.  If  this  model  has  a matte  or  diffusely 
reflecting  surface,  a map  overlay  of  high  quality  will 
result  provided  attention  is  paid  to  the  projection 
geometry.  While  this  was  an  approach  taken  early  on 
[27],  it  really  only  became  practical  in  the  1950’s  with  the 
introduction  of  milling  machines  that  allow  an  operator 
to  carve  a model  by  tracing  contours  on  an  existing  map 
[31—38].  This  is  still  an  expensive,  slow  process  however, 
in  part  because  of  the  manual  work  required  to  smooth 
out  the  resulting  "terraced"  model. 

The  Swiss  school  of  cartography  improved  on  earlier 
forms  [28,29,30,39—41]  and  developed  shading  to  a fine 
art,  producing  numerous  outstanding  maps  in  this  time 
[42—50).  Irnhof  argues  that  automated  methods,  such  as 
relief  model  photography,  cannot  produce  results  nearly 
as  impressive,  since  the  cartographer  cannot  easily 
influence  the  process  [1].  The  manual  shading  method  is 
however  slow  and  expensive,  and  consequently  has  not 
been  used  except  for  small  areas  and  those  of  particular 
interest  or  military  importance.  One  cannot  expect,  with 
significant  areas  of  the  world  still  not  mapped  at  large 
scales,  and  the  rising  cost  of  labor,  that  shaded  overlays 
produced  this  way  will  be  used  in  many  maps. 

Yocli  [51—59]  saw  the  potential  of  the  digital 
computer  in  dealing  with  this  dilemma.  It  is  possible  to 
implement  Wiechel’s  method  based  on  oblique 
illumination  of  a diffusely  reflecting  surface  if  terrain 
elevations  can  be  read  into  a computer  and  suitable 
continuous  tone  output  devices  are  available.  Yo6li  was 
hampered  by  the  lack  of  such  devices  at  that  time. 
Blachut  and  Marsik  tried  to  simplify  the  required 
calculations  to  the  point  where  a computer  might  not 
even  be  required  [60,61].  Peucker  helped  popularize  the 
whole  idea  of  computer-based  cartography  [19,62—64], 
and  found  a piece-wise  linear  approximation  to  the 
equation  for  the  brightness  of  a diffuse  reflector  that 
works  well  [63],  Many  other  interesting  reports  appeared 
during  this  time  on  the  subject  of  hill-shading,  too 


numerous  to  mention  individually  [65—72], 

Brassel  [73—77]  took  ImhoPs  admonitions  seriously 
and  tried  to  implement  as  much  as  had  been  formalized 
of  the  "Swiss  manner".  With  the  output  devices  available 
to  him  at  that  time  it  was  not  easy  to  judge  whether  the 
added  complexity  was  worth  the  effort.  All  of  these 
computer  based  methods  require  detailed  digital  terrain 
models.  The  storage  capacity  and  techniques  for 
handling  this  kind  of  information  now  exist  [33,78—84], 
as  do  the  photographic  output  devices  needed.  There  has 
been  significant  progress,  too,  in  the  automation  of  the 
generation  of  digital  terrain  models  directly  from  aerial 
photographs  [84—91],  partly  as  a byproduct  of  work  on 
orthophoto  generation  [92—95].  More  compact  and 
appropriate  representations  for  these  terrain  models  are 
under  investigation  [96—99],  as  are  alternate  methods  for 
relief  protrayal  such  as  block  diagrams  [100—103], 

Considerable  progress  has  been  made  recently  in  the 
computer  graphics  area  in  the  portrayal  of  regular  objects 
with  simple  surfaces  [104—115].  Early  models  for  the 
reflection  of  light  from  matte  surfaces  [116—119]  are 
being  elaborated,  including  some  for  the  material  on  the 
lunar  surface  [120—129].  In  this  context,  work  on 
models  of  the  microstructure  of  surfaces  is  relevant 
[130—135].  In  a recent  effort  in  the  machine  vision  area, 
a method  was  developed  for  portraying  the  dependence  of 
brightness  on  surface  orientation  using  the  so-called 
reflectance  map  [136—140].  The  reflectance  map  can  be 
determined  if  the  detailed  geometric  dependence  of 
reflection  from  the  surface  [141,142]  and  the  distribution 
of  light  sources  are  known.  Alternatively,  it  can  be 
found  empirically,  or  derived  directly  by  analyzing  the 
interaction  of  lightrays  with  the  surface  microstructure. 

As  a result  of  the  development  of  the  reflectance 
map,  the  availability  of  detailed  digital  terrain  data,  small 
computers  able  to  perform  the  simple  calculations 
required  and  geometrically  accurate  gray-level  output 
devices,  we  may  say  that  automatic  hill-shading  has  come 
of  age. 


Digital  Terrain  Models 

For  many  applications  of  cartographic  data  it  is 
useful  to  have  machine-readable  surface  representations. 
Such  terrain  models  are  used  for  example  in  the  design  of 
roads  and  in  order  to  determine  the  region  irradiated  by 
a radio  frequency  antenna.  Initially,  digital  terrain 
models  were  generated  manually  by  interpolation  from 
existing  contour  maps.  This  is  a tedious,  error-prone 
process  producing  a digitized  version  of  the  surface 
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represented  by  the  contours,  which  in  turn  is  a smoothed, 
generalized  version  of  the  real  surface. 

The  contour  information  on  topographic  maps  is 
produced  by  manual  scanning  of  stereo  pairs  of  aerial 
photographs.  Today,  fortunately,  stereo-comparators 
often  come  equipped  with  coordinate  readouts  that  allow 
the  extraction  of  information  needed  for  the  generation 
of  digital  terrain  models  [84].  Conveniently  taken  during 
orthophoto  generation  [92—95],  the  data  tends  to  be 
accurate  and  detailed.  Even  more  exciting  is  the  prospect 
for  machines  that  achieve  stereo  fusion  without  human 
help  [84—91],  since  they  will  lead  to  the  automatic 
production  of  digital  terrain  models.  In  the  past  such 
machines  had  difficulties  dealing  with  uniform  surfaces 
such  as  lakes,  featureless  surfaces,  large  slopes,  and  depth 
discontinuities,  as  well  as  broken  surfaces,  such  as  forest 
canopies.  This  is  apparently  still  true  when  aerial 
photographs  are  used  with  disparities  large  enough  to 
ensure  high  accuracy. 

Various  representations  can  be  chosen  for  the 
surface  elevation  information.  Series  expansion,  a 
weighted  sum  of  mathematical  functions  such  as 
polynomials,  Gaussian  hills  or  periodic  functions  may  be 
used.  These  tend  to  be  expensive  to  evaluate  however 
and  not  accurate  in  approximating  surfaces  that  have 
slope  discontinuities.  This  is  important  for  many  types  of 
terrain,  at  all  but  the  largest  scales.  Perhaps  the  simplest 
surface  representation  is  an  array  of  elevations,  (z,^), 
based  on  a fixed  grid,  usually  square.  Determining  the 
height  at  a particular  point  is  simple  and  the  interchange 
of  terrain  models  between  users  is  easy  since  the  format 
is  so  trivial  One  disadvantage  of  this  kind  of  surface 
representation  is  its  high  redundancy  in  areas  where  the 
surface  is  relatively  smooth.  The  illustrations  in  this 
paper  are  based  on  digital  terrain  models  consisting  of 
arrays  of  elevation  values. 

Methods  that  achieve  considerable  data  compression 
by  covering  the  surface  with  panels  stretched  between 
specially  chosen  points  have  been  developed  [97—99). 
These  exploit  the  fact  that  real  geographical  surfaces  are 
not  arbitrary  sets  of  elevations  but  have  definite  structure 
and  regularity.  Such  representations  may  ultimately 
replace  the  simpler,  more  voluminous  ones,  if  users  can  be 
persuaded  to  accept  the  greater  programming  complexities 
involved.  , 

Digital  terrain  models  may  also  be  referred  to  as 
digital  elevation  models  if  they  contain  no  information 
other  than  the  elevation  values. 


The  Reflectance  Map 

The  human  visual  system  has  a remarkable  ability 
to  determine  the  distance  to  objects  viewed,  as  well  as 
their  shape,  using  a variety  of  depth  cues.  One  such  cue 
is  shading,  the  dependence  of  apparent  brightness  of  a 
surface  element  on  its  orientation  with  respect  to  the 
light  source(s)  and  the  viewer.  Without  this  particular 
depth  cue  we  would  be  hard  pressed  to  interpret  pictures 
of  smooth,  opaque  objects  such  as  people,  since  other 
cues  like  stereo  disparity  and  motion  parallax  are  absent 
in  a flat,  still  photograph.  It  can  be  shown  that  shading 
contains  enough  information  to  allow  the  observer  to 
recover  the  shape.  In  fact,  a computer  program  has  been 
developed  that  can  do  this  using  a single  digitized  image 

[136] . 

Such  work  in  the  area  of  machine  vision  has  led  to 
a need  to  model  the  image-forming  process  more  carefully 

[137] ,  The  input  to  the  visual-sensing  system  is  image 
irradiance,  which  is  proportional  to  scene  radiance  (here 
loosely  called  apparent  brightness)  [139].  Scene  radiance 
in  turn  can  be  related  to  the  underlying  geometric 
dependence  of  reflectance  of  the  surface  material  and  the 
distribution  of  light  sources  [141,142].  Here  we 
concentrate  on  the  dependence  of  scene  radiance  on  the 
orientation  of  the  surface  clement.  Shaded  overlays  for 
maps  are  interpreted  by  the  viewer  using  the  same 
mechanism  normally  employed  to  determine  the  shape  of 
three-dimensional  surfaces  from  the  shading  found  in 
their  images.  Thus  shaded  overlays  should  be  produced 
in  a way  that  emulates  the  image-fonning  process,  one  in 
which  brightness  depends  on  surface  orientation.  This  is 
why  the  reflectance  map,  which  captures  this  dependence, 
is  useful  in  this  endeavor. 

Consider  a surface  z(.r,y)  viewed  from  a great 
distance  above  (see  Fig.  2).  Let  the  jr-axis  point  to  the 
East,  the  y-axis  North  and  the  z-axis  straight  up.  The 
orientation  of  a surface  element  can  be  specified  simply 
by  giving  its  slope,  p,  in  the  x (West-to-East)  direction 
and  its  slope,  q , in  the  y (South-to-North)  direction.  The 
slopes  p and  q are  the  components  of  the  gradient  vector, 
(p.  q).  The  apparent  brightness  of  a surface  element 
Rip.q),  depends  on  its  orientation,  or  equivalently,  the 
local  gradient.  It  is  convenient  to  illustrate  this 
dependence  by  plotting  contours  of  constant  apparent 
brightness  on  a graph  with  axes  p and  q.  This  reflectance 
map  [137]  provides  a graphic  illustration  of  the 
dependence  of  apparent  brightness  on  surface  orientation. 
The  pq- plane,  in  which  the  reflectance  map  is  drawn,  is 
called  the  gradient  space,  because  each  point  in  it 
corresponds  to  a particular  gradient. 
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Figure  2:  Coordinate  system  and  viewing  geometry.  The 
viewer  is  at  a great  distance  above  the  terrain 
so  that  the  projection  is  orthographic. 


Surface  orientation  has  two  degrees  of  freedom. 
We  have  chosen  here  to  specify  the  orientation  of  a 
surface  element  by  the  two  components  of  the  gradient. 
Another  useful  way  of  specifying  surface  orientation  is  to 
find  the  intersection  of  the  surface  normal  with  the  unit 
sphere.  Each  point  on  the  surface  of  this  Gaussian 
sphere  again  corresponds  uniquely  to  a particular  surface 
orientation.  If  the  terrain  is  single- valued,  with  no 
overhangs,  all  surface  normals  will  point  more  or  less 
upwards  and  pierce  the  Gaussian  sphere  in  a hemisphere 
lying  above  an  equator  corresponding  to  the  horizontal 
plane.  Gradient  space  happens  to  be  the  projection  of 
this  hemisphere  from  the  center  of  the  sphere  onto  a 
plane  tangent  at  the  upper  pole. 

While  we  will  not  use  this  representation  in  the 
calculation  of  relief  shading,  it  is  helpful  in  understanding 
previous  attempts  at  graphical  portrayal  of  the 
dependence  of  apparent  brightness  on  surface  orientation. 
The  first  such  method  was  developed  by  Wiechel  more 
than  a century  ago  [8].  His  brilliant  analysis  appears  to 
have  been  largely  ignored  partly  because  it  depended  on 
mathematical  manipulations  that  may  have  been 
inaccessible  to  many  of  the  intended  users.  Later, 
Tanaka  invented  another  method  showing  the  variation 
of  apparent  brightness  with  surface  gradient  [10,11,24,23]. 


This  second  precursor  of  the  reflectance  map  also  appears 
to  have  found  little  following. 

Position  Dependent  Effects 

Since  the  reflectance  map  gives  apparent  brightness 
as  a function  of  local  surface  gradient  only,  it  does  not 
take  into  account  effects  dependent  on  the  position  of  the 
surface  element.  One  such  effect  is  illumination  of  one 
surface  element  by  another.  Fortunately  this  mutual 
illumination  effect  is  small  unless  surface  reflectance  is 
quite  high  [1 37].  It  is  not  known  whether  mutual 
illumination  effects  aid  or  hinder  the  perception  of 
surface  shape.  They  are  difficult  to  calculate  and  so 
have  not  been  emulated  in  work  on  hill-shading. 

Another  position  dependent  effect  on  apparent 
brightness  is  the  blocking  of  light  by  one  portion  of  the 
surface  before  it  reaches  another.  Cast  shadows  can  be 
calculated  by  determining  which  surface  elements  are  not 
visible  from  the  point  of  view  of  the  light  source  [138]. 
Shadows  cast  by  one  complicated  shape  on  another  are 
hard  to  interpret  however  and  apparently  detract  from 
the  visual  quality  of  shaded  overlays  [1,36,37].  They  are 
thus  rarely  included. 

Scattering  of  light  by  air  molecules  and  aerosol 
particles  changes  the  apparent  brightness  of  a surface 
element  viewed  through  the  atmosphere.  The  brightness 
is  shifted  towards  a background  value  equal  to  the 
brightness  of  an  infinitely  thick  layer  of  air.  The 
diffeience  between  the  brightness  and  the  background 
value  decreases  with  the  thickness  of  the  gaseous  layer 
through  which  the  surface  is  viewed  [143].  The  resulting 
reduction  in  contrast  as  a function  of  distance  is  referred 
to  as  aerial  perspective  and  can  be  a useful  depth  cue, 
although  there  is  no  general  agreement  that  it  aids  the 
perception  of  surface  shape.  It  has  been  used  at  times  by 
map-makers  and  can  be  modeled  easily  [1,74,76,77].  The 
effect  has  not  been  added  to  any  of  the  hill-shading 
schemes  presented  here  in  order  to  simplify  comparisons. 

Where  Do  Reflectance  Maps  Come  From? 

A reflectance  map  may  be  based  on  experimental 
data.  One  can  mount  a sample  of  the  surface  in  question 
on  a goniometer  stage  and  measure  its  apparent 
brightness  from  a fixed  viewpoint  under  fixed  lighting 
conditions  while  varying  its  orientation.  Instead,  one  can 
take  a picture  of  a test  object  of  known  shape  and 
calculate  the  orientation  of  the  corresponding  surface 


element  for  each  point  in  the  image.  The  reflectance 
map  is  then  obtained  by  reading  off  the  measured 
brightness  there. 

Alternatively,  one  may  use  even  more  detailed 
information  about  tight  reflection  from  the  surface.  The 
bidirectional  reflectance  distribution  function  (BRDF) 
describes  how  bright  a surface  will  appear  viewed  from 
one  specified  direction  when  illuminated  from  another 
specified  direction  [141,142].  By  integrating  over  the 
given  light  source  distribution  one  can  calculate  the 
reflectance  map  from  this  information  [139].  Crudely 
speaking,  the  reflectance  map  is  like  a "convolution"  of 
the  BRDF  and  the  source-radiance  distribution. 

Most  commonly,  reflectance  maps  are  based  on 
phenomenological  models,  rather  than  physical  reality. 
The  so  called  Lambertian  surface,  or  perfect  diffuser,  for 
example,  has  the  property  that  it  appears  equally  bright 
from  all  viewing  directions.  It  also  reflects  all  light, 
absorbing  none.  It  turns  out  that  these  two  constraints 
are  sufficient  to  determine  uniquely  the  BRDF  of  such  a 
surface,  and  from  it  the  reflectance  map,  provided  the 
positions  of  the  light  sources  are  also  given.  Some 
reflectance  maps  are  based  on  mathematical  models  of 
the  interaction  of  light  with  the  surface.  Such  models 
tend  to  be  either  too  complex  to  allow  analytic  solution 
or  too  simple  to  represent  real  surfaces  effectively. 
Nevertheless  some  have  come  quite  close  to  predicting  the 
observed  behavior  of  particular  surfaces  [133—135]. 

Here,  new  reflectance  maps  will  be  determined, 
based  on  proposed  methods  for  producing  shaded  overlays 
for  maps.  Their  derivation  will  not  depend  an 
understanding  of  the  image-formation  process  or  the 
physics  of  light  reflection.  Instead,  they  will  require  an 
analysis  of  how  the  brightness  of  a point  in  the  overlay 
depends  on  the  gradient  of  the  underlying  geographical 
surface. 

Which  reflectance  map  should  be  used?  The 
answer  to  this  question  must  depend  on  the  quality  of  the 
impression  a viewer  gets  of  the  shape  of  the  surface 
portrayed.  Various  methods  for  producing  shaded 
overlays  can  be  compared  by  evaluating  sample  products 
and  classified  according  to  the  corresponding  reflectance 
maps.  It  will  become  apparent  that  in  this  way  general 
conclusions  can  be  drawn  about  a new  method  just  by 
inspecting  its  reflectance  map. 


Normalization  of  Gray  Tone 

A picture  made  by  applying  varying  amounts  of 
lightabsorbing  substances,  such  as  ink,  to  an  opaque, 


diffusely  reflecting  material  like  paper,  has  a limited 
dynamic  range.  Reflectance  is  limited  at  the  low  end  by 
the  properties  of  the  ink  and  at  the  high  end  by  the 
paper,  which  will  at  most  reflect  all  the  light  incident 
upon  it,  unless  it  fluoresces.  The  diffuse  reflectance  is 
thus  always  less  than  or  equal  to  one.  Similarly,  if 
absorbing  substances  are  used  on  a transparent  substrate, 
a limit  applies,  since  transparency  cannot  be  larger  than 
one.  1 

The  problem  of  fitting  a given  image  into  the 
available  dynamic  range  is  fundamental  to  all  methods  of 
reproduction.  A normalization  is  applied  so  that  the 
maximum  apparent  brightness  to  be  reproduced  is 
represented  by  a reflectance  of  one  (or  whatever  the 
maximum  is  for  the  paper  being  used).  This  scaling  will 
have  to  be  applied  whenever  relief  shading  is  based  on 
models  of  image-formation  by  light  reflected  from  the 
terrain  surface. 


Gradient  Estimation 

The  apparent  brightness  of  a surface  element 
depends  on  its  orientation  with  respect  to  the  viewer  and 
the  light  source.  The  orientation  of  the  surface  element 
is  described  fully  by  a surface  normal,  or  equivalently  by 
the  gradient.  The  components  of  the  gradient  are  the 
slopes  p (in  the  West-to-East  direction)  and  q (in  the 
South-to-North  direction).  These  slopes  have  to  be 
estimated  from  the  array  of  terrain  elevations.  It  is 
convenient  to  use  a short-hand  here  for  elevations  in  the 
neighborhood  of  a particular  point  (see  Fig.  3).  In  the 
context  of  a single  point  at  discrete  coordinate  (ij),  we 
will  denote  the  elevation  at  that  point  by  z^,  while 
elevations  of  the  adjacent  grid  points  to  the  West  and 


East  will  be  called  z^  and  z+0  respectively.  Similarly, 
elevations  at  the  points  to  the  South  and  North  will  be 
denoted  Zq_  and  z0+. 
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Figure  3:  Short-hand  notation 
neighboring  points. 

for  elevations  of 
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Gradient  Smoothing  Effects 


The  simplest  estimates  for  the  slope  p might  be 

P+  - <*+o  ~ W/**  and  P-  = ^oo  “ *-o>/^. 

where  Ax  is  the  grid  interval  in  the  West-to-East 
direction,  expressed  in  the  same  units  as  the  terrain 
elevations.  These  estimates  are  biased,  actually 
estimating  the  slope  half  a grid  interval  to  the  right  and 
left  of  the  central  point  respectively.  Their  average 
however,  the  central  difference,  is  unbiased, 

Pc  - <*+o  ~ *-o>/2A* 

Numerical  analysis  [144—146]  teaches  us  that  for 
certain  classes  of  surfaces  an  even  better  estimate  is 
obtained  using  a weighted  average  of  three  such  central 
differences, 

Pw  = 

((*++  + 2 z+0  4-  z+J  - (z_+  + 2z_c  + ^ — )]/8Ajc 
Symmetrically,  one  can  estimate  the  South-to-North  slope, 

qw  - 

+ 2z0+  + z_+)  - (z+_  + 2z(t_  + z_)]/8Ay 

These  expressions  produce  excellent  estimates  for 
the  components  of  the  gradient  of  the  central  point.  The 
results  depend  on  elevations  in  a 3X3  neighborhood, 
with  individual  elevation  values  weighted  less  than  they 
are  in  the  simpler  expression  for  the  central  difference. 
This  has  the  advantage  that  local  errors  in  terrain 
elevation  tend  not  to  contribute  as  heavily  to  error  in 
slope.  At  the  same  time,  more  calculations  are  required 
and  three  rows  of  the  digital  terrain  model  have  to  be 
available  at  one  time. 

Care  has  to  be  taken  to  avoid  corruption  of  the 
slope  estimates  by  quantization  noise  in  the  elevation 
values.  Numerical  problems  due  to  the  division  of  small 
integers  may  result  when  a terrain  model  is  too  finely 
interpolated,  with  limited  vertical  resolution.  If  it  is 
necessary  to  generate  many  pixels  in  the  output,  it  is 
better  to  interpolate  the  gray  values  produced  by  the 
shading  algorithm. 


More  complicated  slope  estimators  than  the  ones 
described  tend  to  introduce  a smoothing  effect,  as  can  be 
seen  by  applying  them  near  points  of  discontinuity  in 
slope.  To  illustrate  this  more  clearly,  consider  two 
horizontal  smoothing  operations  H+  and  H-  that  modify 
the  terrain  model  as  follows. 

H+:  ^oo  - (*oo  + **oV2 
and  H-:  z'^  = (z^  + z^/2 

It  can  now  be  seen  that  the  central  difference  slope 
estimate,  pr  on  the  original  terrain  model,  equals  the 
biased  estimate,  p+,  calculated  from  the  terrain  mocel 
smoothed  using  H-,  or,  equivalently,  the  biased  estimate, 
p_,  calculated  from  the  terrain  model  smoothed  using 
H+.  Next  consider  two  vertical  smoothing  operation  V+ 
and  V-  in  which  the  terrain  model  is  modified  as  follows, 

V+:  z'oo  - <zoo  + *o+)/2 
and  V-:  z'^  -=  (Zq_  + zj/ 2 

The  complicated  slope  estimate,  p ^ can  be  shown  to 
produce  the  same  result  as  the  first  difference  p+ 
calculated  from  a terrain  model  smoothed  by  applying 
H-,  V+  and  V-.  Similarly  the  slope  estimate,  q^,  equals 
q+  calculated  from  a terrain  model  smoothed  by  applying 
V-,  H+  and  H-  (Actually,  since  all  of  these  operations 
are  linear,  their  order  can  be  arbitrarily  rearranged). 
Perhaps  any  "smoothing"  desired  should  be  done  as  a 
separate  editing  operation,  combined  with  the  removal  of 
"glitches"  from  the  digital  elevation  model,  rather  than  as 
part  of  the  slope  estimation.  Also  for  terrain  models  of 
relatively  limited  size  this  smoothing  may  be  undesirable. 
Some  other  slope  estimators  are  simpler  and  introduce 
less  smoothing.  For  example  one  can  combine  two  biased 
estimates  of  the  slope  to  get, 

PVi  * Kz++  *+o^  (*o+  zoo)1/2Ax 

and  symmetrically, 

% 4 “ + *0+)  “ (*+o  + Wl/2 4P 

Here  the  average  gradient  in  the  top-right  quadrant, 
(*oo'  z+o'  *++<  zo+)>  rather  than  at  the  central  point  is 
being  estimated,  using  elevations  in  a 2 X 2 neighborhood 
only.  For  the  graphic  illustrations  presented  here,  the 
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expressions  for  p^  and  w*re  used  to  estimate  the 
gradient. 

At  this  time  some  terrain  models  are  still  produced 
by  hand  and  have  rather  limited  size.  Rather  than 
smoothing  the  terrain,  one  may  wish  to  increase  apparent 
resolution  by  some  means.  This  can  be  done  quite 
effectively  by  combining  biased  slope  estimates  (see 
Fig.  4).  Foi  every  point  in  the  terrain  model,  four  gray 
values  are  produced  corresponding  to  the  four  quadrants 
around  it.  Each  is  based  on  a different  combination  of 
the  slope  estimates  (/>_  or  p+)  and  (q_  or  q+)  as 
appropriate  for  that  quadrant.  Mo  miracles  should  be 
anticipated;  this  method  cannot  create  information 
where  there  is  none,  but  it  can  stretch  what  is  available 
to  its  limits. 


Rotated  Gradients 
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It  has  been  cartographic  practice  to  assume  a light 
source  in  the  North-West  at  a 45°  elevation  above  the 
horizon.  It  is  helpful  in  this  case  to  introduce  a rotated 
coordinate  system  (see  Fig.  5)  with 

p'=  (p  — q)/&  and  / = (p  + q)/'ft 


Figure  4:  Combinations  of  biased  slope  estimates  can  be 
used  to  plot  four  times  as  many  grey-tones  as 
there  arc  elevation  values  in  the  terrain  model. 
The  limited  amount  of  data  in  a small  terrain 
model  may  be  stretched  this  way  to  produce 
reasonably  detailed  hill-shading  output. 


If  A.v  = A y «■  A say,  then  the  slopes  in  the  North-West 
to  South-East  and  in  the  South-West  to  North-East 
direction,  can  be  estimated  particularly  easily  by 
combining  the  formulas  for  pw  anJ  q*„ 

P'w  - 

«‘'+0  + + *o_)  - (i_0  + 2_+  + *0+)]/4  V2  A 

<r'w  - 

N*o+  + *++  + *+0)  - <*o-  + *—  + *-o»l/4  VJ  A 

If  one  wishes  to  estimate  the  slopes  for  the  center 
of  the  top-right  quadrant  (in  the  unrotated  coordinate 
system)  rather  than  the  central  point  one  may  combine 
the  expressions  for  p^  and  qy±  to  get  the  simple 
formulas, 

P Vi  = ^+o  *o+)/ ^ ^ 

and  q’^  * (z++  - z^/fl  A 

One  advantage  of  the  rotated  coordinate  system  stems 
from  the  fact  that  models  of  surface  reflectance 
considered  here  are  symmetric  with  respect  to  a line 
pointing  towards  the  source.  That  is,  a surface  element 


Figure  5:  Rotated  coordinate  system  that  may  be 
convenient  when  the  assumed  light-source  is  in 
the  North-West.  The  reflectance  map  is 
symmetrical  about  the  p'-axis. 
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with  slopes  p'  — p'Q  and  q — q'0  say,  has  the  same 
apparent  brightness  as  one  with  slopes  p'  = p'Q  and  q'  = 
— q'Q.  Thus  a lookup  table  based  on  the  rotated 
coordinate  system  can  be  smaller,  since  only  that  half  of 
the  table  corresponding  to  q'  > 0 need  be  stored. 

So  far  we  have  assumed  that  the  grid  of  the  terrain 
model  is  aligned  with  the  geographical  coordinates.  If 
instead  the  whole  model  is  rotated  anti-clockwise  by  an 
angle  6,  then  slopes  p"  and  <{'  can  first  be  estimated 
from  the  model  as  described  and  then  transformed  as 
follows: 

p = p"  cos  6 - q"  sin  0 

and  <7  = p"  sin  6 + <f‘  cos  0 

Alternatively,  the  model  can  be  resampled  to  produce  a 
new  version  on  a grid  aligned  with  the  axes. 

More  complicated  slope  estimators  than  those 
discussed  here  do  not  seem  called  for,  since  the  simple 
ones  shown  produce  excellent  results.  Furthermore, 
estimators  having  wider  support,  while  known  to  be  more 
accurate  for  certain  classes  of  functions  such  as 
polynomials,  may  perforin  worse  on  typical  terrain  with 
its  discontinuities  in  slope  along  ridge  and  stream  lines 

Exaggeration  of  Terrain  Elevations 

Compared  to  objects  of  a size  that  allow  for  easy 
manipulation  by  a human  observer,  the  surface  of  the 
earth  is  in  many  places,  though  not  everywhere,  rather 
flat.  The  range  of  slopes  is  often  so  small  as  to  cause 
disappointment  with  correctly  proportioned  models,  so 
that  height  is  often  exaggerated  in  physical  models 
Similarly,  shading  based  on  models  of  light  reflection 
from  a surface  tends  ro  have  undesirably  low  contrast. 
Here  too  terrain  elevations  may  be  exaggerated  for  all 
but  the  most  mountainous  regions.  This  is  equivalent  to 
multiplication  of  the  components  of  the  gradient  by  a 
constant  factor,  and  corresponds  to  a simple 
transformation  of  the  reflectance  map.  For  reflectance 
maps  based  on  reflection  of  light  originating  from  an 
assumed  source,  a similar  effect  can  often  be  achieved  by 
a decrease  in  the  elevation  of  the  source.  For  flat 
surfaces  the  source  may  be  lowered  to  a mere  10°  or  20° 
above  the  horizon,  where  normally  it  might  be  at  45°. 

Producing  Shaded  Overlays 

The  generation  of  shaded  images  from  a digital 
terrain  model  using  the  reflectance  map  is 


straightforward,  (see  Fig.  1).  For  each  point  in  the 
terrain  model  the  local  gradient  (p.q)  is  found.  The 
reflectance  map  then  provides  the  appropriate  brightness 
R(p,q),  to  be  plotted  on  a suitable  gray-level  output 
device.  All  computations  are  local  and  can  be 
accomplished  in  a single  pass  through  the  image. 

To  illustrate  these  ideas  a simple  program  is  shown 
(see  Fig.  6)  that  does  not  incorporate  any  of  the 
elaborations  described  later  on.  Two  arrays  are  used,  Z, 
to  store  the  terrain  elevations,  and  B,  to  store  the 
calculated  brightness  values.  The  I 'ter  has  one  row  and 
one  column  fewer,  since  its  entries  correspond  to  points 
lying  between  those  in  the  elevation  array  (the  formulas 
for  and  are  used).  The  spacing  of  the  underlying 


begin  SHADING  (N.  M.  DX.  DY): 
array  Z (N\  M); 
array  B (N-l.  M-I): 

<rcad  terrain  elevations  into  array  Z> 

do  J = / to  .»/-/. 
do  / = / to  N-l: 

J 1)  :=  R(PE(I.  J).  QE(l.  J)): 

end  do; 
end  do; 

<write  brightness  values  from  array  B> 

begin  procedure  PE(I,  J): 

(Z(l  J)  - Z(l  /.  J)  - Z(l,  J-l)  - Zfl-I,  J-l))  / (2.0  • 

DX) : 

end  PE: 

begin  procedure  QE(I,  J): 

(. 7.(1  J)  * Z(l  J-l)  - Z(l-J.  J)  - Z(I-1,  J-l))  / (2.0  • 

DY) : 

end  QE; 

begin  procedure  R(P.  Q): 

max (0.0,  minf/.O,  (1.0  * P Q)  / 2.0)): 

end  R: 

end  SHADING: 


Figure  t>  Simple  program  tr  genet  ate  «hadrd  owtpwt 
from  a terrain  motel 
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grid  is  DX  in  the  West-to-East  direction  and  DY  in  the 
South-to-North  direction.  The  procedures  PE(IJ)  and 
QE(IJ)  estimate  the  slopes,  while  the  procedure  R(P,Q ) 
calculates  the  corresponding  brightness  using  a 
particularly  simple  reflectance  map.  The  resulting  values 
range  from  0.0  (black)  to  1.0  (white)  and  have  to  be 
scaled  appropriately  before  they  can  be  fed  to  a 
particular  gray-level  output  device. 

Typical  terrain  models  are  quite  large  and  may 
exceed  allowable  array  storage  limits  or  even  the  address 
space  of  a computer.  Fortunately  only  two  (or  three) 
rows  of  the  terrain  model  are  needed  for  the  estimation 
of  the  local  slopes.  The  program  given  can  be  easily 
modified  to  read  the  terrain  model,  and  to  write  the 
calculated  gray  values,  one  line  at  a time.  This  makes  it 
possible  to  deal  with  terrain  models  of  essentially 
arbitrary  size. 

Next  one  should  note  that  terrain  models  typically 
are  stored  using  integer  (fixed  point)  representation  for 
elevations  to  achieve  compactness  and  because  elevations 
are  only  known  with  limited  precision  (an  elevation  may 
be  given  in  meters  as  a 16-bit  quantity  for  example). 
Similarly,  gray  values  to  be  sent  to  a graphic  output 
device  are  typically  quantized  to  relatively  few  levels 
because  of  the  limited  ability  of  the  human  eye  to 
discern  small  brightness  differences  and  the  limited  ability 
of  the  device  to  accurately  reproduce  these  (a  typical 
output  device  may  take  values  between  0 and  255.)  The 
calculations  can  thus  be  carried  out  largely  in  integer 
(fixed  point)  arithmetic  and  even  a simple  computer  is 
adequate. 


Use  Of  Lookup  Tables 

Some  of  the  formulas  for  reflectance  maps  discussed 
later  on  are  quite  elaborate  and  it  would  seem  that  a lot 
of  computation  is  required  to  produce  shaded  output 
using  them.  Fortunately  it  is  possible  to  make  the 
amount  of  computation  equally  small  in  all  cases  by 
implementing  the  reflectance  map  as  a lookup  table. 
This  table  is  computed  only  once,  at  the  beginning. 

Since  elevations  are  quantized,  so  are  the  estimates 
of  slope.  It  is  therefore  not  necessary  that  one  be  able  to 
determine  the  apparent  brightness  for  all  possible  values 
of  the  gradient  (p,q).  Further,  it  is  reasonable  to  place 
an  upper  limit  on  slope,  so  that  only  a finite  number  of 
possible  values  can  occur  (For  example,  if  slopes  between 
•1.55  and  4-1.60  are  considered,  in  increments  of  0.05, 
then  there  are  only  64  possibilities  for  p and  64  for  q , 
and  a lookup  table  with  4096  entries  can  be  used).  A 


second  justification  for  the  use  of  a lookup  table  is  the 
quantization  of  the  gray  values  produced.  It  makes  little 
sense  to  calculate  the  apparent  brightness  with  very  high 
precision  only  to  coarsely  quantize  the  result.  A 
convenient  rule  of  thumb  is  that  the  number  of  possible 
discrete  values  for  each  gradient  component  need  not  be 
more  than  the  number  of  gray  levels  available  from 
output  device.  The  final  choice  of  quantization  must 
take  into  account  both  of  the  above  considerations. 

One  can  separate  the  estimation  of  slope  from  the 
calculation  of  gray  value,  and  produce  an  intermediate 
file  of  coded  surface  gradient  values.  This  file  need  not 
be  larger  that  the  original  terrain  model  if  the  gradient  is 
quantized  properly  (if  p and  q can  each  take  on  64 
values,  each  gradient  can  be  encoded  as  a 12  bit  value). 
Such  a surface  gradient  file  can  be  fed  through  a lookup 
table  procedure  to  produce  the  final  output.  In  this 
fashion  different  reflectance  maps,  encoded  as  different 
lookup  tables,  can  be  applied  to  a terrain  model  easily, 
with  little  more  effort  than  reading  and  writing  a file. 
The  illustrations  here  were  produced  this  way. 

Many  gray-level  raster  displays  have  a translation 
fable  between  the  image  memory  and  the  digital-to- 
analog  converter  driving  the  cathode  ray  tube  intensity 
control.  The  quantized,  packed  reflectance  map  can  be 
loaded  into  this  lookup  table,  while  the  image  memory  is 
loaded  with  the  coded  slope  matrix.  This  allows  one  to 
view  the  same  terrain  with  a variety  of  assumed 
reflectance  properties  simply  by  reloading  the  translation 
table,  which  is  small  compared  to  the  image  memory. 

Taxonomy  of  Reflectance  Maps 

Here  we  have  discussed  some  of  the  issues  one  is 
likely  to  encounter  when  developing  a program  that 
produces  shaded  output.  In  the  remainder  of  this  paper 
we  will  analyze  a number  of  proposed  hill-shading 
methods  in  terms  of  their  equivalent  reflectance  maps. 
Notational  tools  will  be  introduced  as  they  are  needed. 
Rather  than  proceed  in  strict  historic  older,  we  will 
discuss  relief  shading  methods  in  the  following  groups: 

1)  Rotationally  symmetric  reflectance  maps  — 
gray  tone  depends  on  dope  only. 

2)  Methods  based  on  varying  line  spacing  or 
thickness  to  modulate  average  reflectance. 

3)  Ideal  diffuse  reflectance  and  various 
approximations  thereto. 
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4)  Gray  tone  depends  only  on  the  slope  of  the 
surface  in  the  direction  away  from  the 
assumed  light  source. 

5)  Methods  depending  on  more  elaborate 
models  of  diffuse  reflectance  from  porous 
material,  such  as  that  covering  the  lunar 
surface. 

6)  Models  for  gloss  and  lustrous  reflection  — 
smooth  surface,  extended  source  and  rough 
surface,  point  source. 

Average  Reflectance  Of  Evenly  Spaced  Dark  Lines 

Some  early  methods  for  hill-shading  achieve  the 
desired  control  of  gray  tone  by  varying  the  spacing 
between  printed  lines.  One  advantage  of  this  approach  is 
the  ease  with  which  such  information  can  be  printed, 
since  it  is  not  necessary  to  first  screen  a continuous  tone 
image.  One  disadvantage  is  the  confusion  created  when 
the  lines  used  for  this  purpose  are  layed  on  top  of  others 
portraying  planimetric  information.  While  the  directional 
textural  effects  of  the  lines  are  important  in  conveying 
information  about  shape,  we  concentrate  here  on  the 
average  reflectance. 

Consider  inked  lines  with  reflectance  r b covering  an 
area  of  paper  with  reflectance  rw  (see  Fig.  7).  The  ratio 
of  the  area  covered  by  ink  to  the  area  not  covered  is  the 
same  as  the  ratio  of  the  width  of  the  lines  to  the  width 
of  the  uninked  spaces.  This  in  turn  equals  b/w,  where  b 
is  the  width  of  the  inked  line  and  tv  the  width  of  the 
uninked  space  measured  along  any  direction  not  parallel 


Figure  7:  Magnified  portion  of  surface  covered  with 
lines.  The  average  tone  depends  on  the 
fractional  area  covered  by  the  lines,  as  well  as 
the  reflectance  of  the  paper  and  the  ink. 


to  the  lines.  If  we  ignore  diffusion  of  light  in  the  paper, 
then  the  average  reflectance  of  the  surface  is 

R = (tv  rw  + b rb)/(w  + b) 

Or,  R = rw-  b (rw  - rb)/(w  + b) 

If,  for  example,  the  paper  reflects  all  the  incident  light, 
and  the  ink  none,  then  rw  *=  1 and  rb  = 0,  so  that  JR  « 

1 - b/(w  + b). 

Slope  Of  The  Surface  In  An  Arbitrary  Direction 

In  the  calculation  of  gray  value  produced  by  some 
methods  of  hill-shading  it  is  necessary  to  know  the  slope 
of  the  surface  in  an  arbitrary  direction,  given  the  slope  p 
in  the  West-to-East  direction  and  the  slope  q in  the 
South-to-North  direction.  Note  that  p and  q are  the  first 
partial  derivatives  of  the  elevation  z with  respect  to  x 
and  y respectively.  Consider  taking  an  infinitesimal  step 
dx  in  the  x direction  and  an  infinitesimal  step  dy  in  the 
y direction.  The  change  in  elevation,  dz,  is  given  by 

dz  -*  p dx  + q dy 

Along  a contour  line  for  example,  the  elevation  is 
constant,  so  that  for  a small  step  dx  mm  a ds  and  dy  — b 
ds,  we  can  write: 

(p,  q)  • (a,  b)  ds  — 0 

where  denotes  the  dot-product.  The  local  direction  of 
the  contours,  (a,  b)  is  of  course  perpendicular  to  the  local 
gradient  (p,  q). 

Now  consider  taking  a small  step  in  an  arbitrary 
direction,  (p0,  q0)  say.  That  is  let  dx  mm  pQ  ds  and  dy 
» q0  ds.  The  length  of  the  step,  measured  in  the  xy- 
plane  is, 

V Po  + *o2  dt 
While  the  change  in  elevation  is, 

dz  - (p0p  + q0  q)  ds 

Consequently  the  slope,  change  in  elevation  divided  by 
length  of  the  step,  is, 

*-(P0P  + 10  fW  Po1  + qQ2 
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If  we  let  a be  the  angle  between  the  vector  (p0,q0 ) and 
the  X-axis,  then,  the  above  can  also  be  written, 

s ■=  p cos  a + q sin  a 

The  direction  in  the  xy-plane  in  which  the  slope  is 
maximal  can  be  found  by  differentiating  with  respect  to 
a.  The  direction  of  steepest  ascent  is  (p,q)  and  the 
maximum  slope  equals  W*  + fl2- 


Lehmann’s  Bbschungsschraffen  A 

One  of  the  earliest  methods  for  depicting  surface 
shape  using  a form  of  shading  is  that  of  Lehmann  [2,3]. 
Illustrations  based  on  ad  hoc  scales  of  increasing  darkness 
as  a function  of  slope  ("Schwarzegradscalen")  had  been 
published  before,  but  there  was  no  systematic  analysis  of 
this  approach  until  the  appearance  of  an  anonymous 
publication  attributed  to  Lehmann.  In  his  method,  short 
lines  in  the  direction  of  steepest  descent,  called  hachures, 
are  drawn  with  spacing  and  thickness  specified  by  rules 
that  ensure  that  the  fractional  area  darkened  is 
proportional  to  the  angle  of  inclination  of  the  surface,  0. 
That  is,  steeper  implies  darker.  The  lines  merge, 
producing  a continuous  black  area,  when  0 exceeds  some 
maximum  value  0o,  typically  45°  or  60°.  The  slope  of 
the  surface  equals  the  tangent  of  the  angle  of  inclination 
or  "dip".  Using  the  expression  for  the  slope  in  the 
direction  of  steepest  ascent,  we  get, 

tan  6 ~ V P1  + q1 

Consequently,  the  average  reflectance  is, 

R(p.q)  - rw  - (rw  - rb)  tan'1  V p2  + q1  /60 

When  the  angle  of  inclination  exceeds  the  maximum,  the 
lines  coalesce  and  R(p,q)  = rb.  We  can  also  write  the 
above  in  another  form, 

*<ft0)  - rw  - (rw  - rb)  (*/i0) 

Here,  <t>  is  the  azimuth  of  the  direction  of  steepest 
descent,  a quantity  that  does  not  appear  in  the  formula 
on  the  right,  since  apparent  brightness  here  depends  only 
on  the  magnitude  of  the  slope.  The  direction  and 
magnitude  of  the  surface  gradient  can  be  found  from  a 
map  prepared  according  Lehmann’s  rules.  The 
direction  of  steepest  descent  lies  along  the  hachures,  while 
the  slope  is  directly  related  to  the  average  tone  that 


results  from  the  width  and  spacing  of  these  lines.  In 
analyzing  his  method  we  have  concentrated  on  calculating 
the  average  reflectance  produced  in  the  printed  product. 
It  should  be  pointed  out  that  this  method  also  gives  rise 
to  interesting  textural  effects  that  will  not  be  discussed. 

Another  interesting  aspect  of  Lehmann’s  method  is 
that  the  lines  or  hachures  were  drawn  starting  on  one 
contour  and  ending  on  the  next.  This  greatly  contributed 
to  the  later  development  of  the  contour  representation 
(Isohypsen)  for  terrain  surfaces,  that  was  to  ultimately 
replace  most  of  these  early  attempts  at  portraying  surface 
shape  [7]. 

Contour  Density  B 

Another  method  is  based  on  the  observation  that 
lines  on  a contour  map  are  more  crowded  in  steep  areas 
and  that  this  crowding  leads  to  darkening  of  tone  or 
average  gray  value.  In  order  to  calculate  the  dependence 
of  the  average  local  reflectance  on  the  gradient,  (p,  q),  we 
have  to  determine  the  spacing  of  contour  lines  on  the 
map.  We  assume  that  the  surface  is  locally  smooth  and 
can  be  approximated  by  a plane,  at  least  on  the  scale  of 
the  spacing  between  contour  lines  (If  this  is  not  the  case, 
aliasing,  or  undersampling  problems  occur  in  any  case). 

Consider  a portion  of  the  surface  with  slope  s in 
some  direction  not  parallel  to  the  contour  lines  (see 
Fig.  8).  Assume  that  the  map  scale  is  k and  the  vertical 
contour  interval  i.  Then  it  is  clear  that  the  spacing 
between  contours  on  the  map,  d,  can  be  obtained  from 
the  formula  for  slope, 

r - b/(d/k) 

If  we  take  the  cross-section  of  the  surface  in  the 
direction  of  steepest  ascent,  then  s = ^ p2  + q2.  As  a 


Figure  8:  Spacing  between  successive  contour  lines  along 
a given  direction  on  the  topographic  map. 
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result  we  can  write, 

d = * «/>T7T7 

On  the  map,  d =*  b + w.  That  is,  the  spacing  between 
contours  is  the  sum  of  the  width  of  the  contour  lines  and 
the  width  of  the  blank  spaces  between  them.  The 
average  reflectance  then  is, 

R(p,q)  = /•»,-  (b/kb)  (rw  - rb)  V P2  + f2 

The  result  can  also  be  expressed  as, 

R"(9,<t>)  = rw  — (b/kb)  (rw  — rb)  tan  0 

where  0 is  the  inclination  of  the  surface.  The  above 
expressions  only  hold  if  w is  not  negative.  When  the 
slope  is  too  steep,  contour  lines  overlap,  and  the  average 
reflectance  is  simply  equal  to  rb.  In  the  special  case  that 
rw  — I and  rb  = 0,  the  above  simplifies  to, 

R(p.<t)  = 1-  (b/kb)  V p2  + f*~ 

Typically  (b/kb)  may  equal  1 or  1/V3. 

Diffuse  Surface  under  Vertical  Illumination  C D 


This  reflectance  map  leads  to  flatter,  even  less 
interpretable  pictures,  since  the  range  of  reflectances  has 
been  halved  and  all  reflectances  have  been  shifted 
upwards  by  a half.  In  the  derivation  of  the  formula 
above,  reflection  from  the  surrounding  terrain  surface  is 
ignored.  If  the  terrain  surface  diffusely  reflects  a 
fraction  p of  the  incident  light,  the  constant  term  in  the 
above  expression  is  increased  from  Vi  to  (1  + p)/2, 
while  the  coefficient  of  cos  6 decreases  from  Vi  to  (1  — 
p)/2.  It  is  at  times  suggested  that  a component  of 
surface  brightness  due  to  distributed  illumination  from 
the  sky  be  added  to  that  resulting  from  oblique 
illumination.  This  however  typically  detracts  from  the 
shaded  result,  rather  than  improving  it. 

The  methods  discussed  so  far  give  rise  to 
rotationally  symmetric  reflectance  maps,  that  can  be 
described  adequately  by  a single  cross-section,  showing 
tone  versus  slope  11,37],  This  representation  has 
sometimes  been  misused  for  asymmetric  reflectance  maps, 
where  it  does  not  apply.  Rotationally  symmetric 
reflectance  maps  produce  shaded  images  that  are  difficult 
to  interpret.  Moving  the  assumed  light  source  away  from 
the  overhead  position  gives  rise  to  better  shaded  map 

overlays,  but  forces  us  to  introduce  some  new  concepts. 

The  Surface  Normal 


The  methods  discussed  so  far  produce  tones  that 
depend  on  the  magnitude  of  the  gradient  only,  not  its 
direction.  This  is  similar  to  the  effect  one  would  obtain 
if  a physical  model  of  the  terrain  was  illuminated 
vertically,  with  the  light  source  placed  near  the  viewer. 
An  ideal  diffusing  surface  has  an  apparent  brightness  that 
is  proportional  to  the  cosine  of  the  incident  angle,  /',  as 
discussed  later.  This  is  the  angle  between  the  direction 
of  the  incident  rays  and  the  local  normal,  which,  in  the 
case  of  vertical  illumination,  is  just  0.  Therefore, 

R(9,4>)  mm  cos  9 

Or,  R(p,q)  - 1 / V 1 + p2  + f2 

Instead  of  illumination  from  a point  source,  one  may 
consider  the  effect  of  a distributed  source.  A uniform 
hemispherical  source  illuminating  a diffusely  reflecting 
surface  leads  to  a result  of  the  following  form  (139], 

R'(9,<b)  - cos  J(  0/2)  - Vi  + Vi  cos  0 

Or,  R(p.  q)  - (1  + 1/V  I + ft2  + q*  ) / 2 


The  surface  normal  is  a vector  perpendicular  to  the 
local  tangent  plane.  The  direction  of  the  surface  normal, 
n,  can  be  found  by  taking  the  cross-product  of  any  two 
vectors  parallel  to  lines  locally  tangent  to  the  surface  (as 
long  as  they  are  not  parallel  to  each  other).  We  can  find 
two  such  vectors  by  remembering  that  the  change  in 
elevation  when  one  takes  a small  step  dx  in  the  x- 
direction  is  just  dz  — p dx,  while  the  change  in 
elevation  corresponding  to  a step  dy  in  the  ^-direction  is 
dz  *m  q dy.  The  two  vectors,  (1,0 ,p)  dx  and  (0,1,  q)  dy 
are  therefore  parallel  to  lines  tangent  to  the  surface  and 
so  their  cross-product  is  a surface  normal. 

n — (1,0,?)  X (0,1,  q)  — (—p.—q,  1)- 

Note  that  the  gradient,  ( p,q\  is  just  the  (negative) 
projection  of  this  vector  on  the  xy-plane.  A unit  surface 
normal,  N,  can  be  obtained  by  dividing  the  vector  n by 
its  magnitude  n ■■  yfl  + p2  + f2. 

While  it  is  convenient  to  specify  directions  as 
vectors,  it  is  at  times  helpful  to  use  spherical  coordinates 
instead.  A direction  can  then  be  given  as  an  azimuth 
angle,  0,  measured  anti-clockwise  from  the  x-axis,  and  a 


if 
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Figure  9:  Definition  of  the  azimuth  angle,  0,  and  the 
zenith  angle,  ft  Here,  azimuth  is  measured 
counter-clockwise  from  the  *-axis  in  the  xy- 
plane,  while  the  zenith  angle  is  measured  from 
the  z-axis. 


polar  or  zenith  angle,  0 (see  Fig.  9).  (In  navigation,  the 
azimuth  angle  is  usually  measured  clockwise  from  North, 
and  the  elevation  angle  is  given  instead  of  the  zenith 
angle.  These  are  just  the  complements  of  the  angles  used 
here.)  The  unit  vector  in  the  direction  so  defined  equals, 

N — (cos  0 sin  ft  sin  <t>  sin  ft  cos  6) 

To  find  the  azimuth  and  zenith  angle  of  the  surface 
normal  we  identify  components  of  corresponding  unit 
vectors.  Then, 

sin  0 — —q/ylp1  + q2 
and  cos  0 » — o/V/t2  + q2 

while, 

sin  0 —■  yip^+q2  / yJi+^+q2 
and  co*  9 — 1 / Vh-^+v2 

Conversely, 

p — — cos  0 tan  9 and  q ■«  — sin  0 tan  9 

We  will  find  it  convenient  to  use  both  vector  and 
spherical  coordinate  notation  to  specify  direction. 


i 

Position  Of  The  Light  Source 

The  reflectance  maps  discussed  so  far  are 
rotationally  symmetric  about  the  origin,  only  the 
magnitude  of  the  gradient,  not  its  direction  affecting  the 
resulting  gray  value.  This  corresponds  to  a situation 
where  the  light  source  is  at  the  viewing  position.  Most 
hill-shading  methods  have  the  assv.med  light  source  in 
some  other  position,  typically  in  the  North-West,  with  a 
zenith  angle  of  around  45°  (90  = 45°,  0O  = 135°).  The 
unit  vector, 

S = (cos  0O  sin  60,  sin  0O  sin  80,  cos  80) 

points  directly  at  the  light  source.  A surface  element  will 
be  illuminated  maximally  when  the  rays  from  the  light 
source  strike  it  perpendicularly,  that  is,  when  the  surface 
normal  points  at  the  light  source.  By  identifying 
components  in  the  expression  for  the  surface  normal,  n0 
* (— pQ,— ?0,1),  with  those  in  the  expression  for  the 
vector  pointing  at  the  source  one  finds  that  the 
components  of  the  gradient  of  such  a surface  element  are, 

ft. 

Po  ” - cos  f>0  ,an  *0  and  *0  ■ ~ sin  *0  tan  60 

When  the  source  is  in  the  standard  cartographic  position, 
this  means, 

p0  - l/i/2  and  q0  = -i/Jl 

This  standard  position  for  the  assumed  light  source  was 
probably  chosen  because  we  are  used  to  viewing  objects 
lighted  from  that  direction  [1].  When  we  look  at  nearby 
objects  in  front  of  us,  our  body  blocks  the  light  arriving 
from  behind  us.  Further,  when  writing  on  a horizontal 
surface,  many  of  us  find  our  right  hand  blocking  light 
coming  from  that  direction.  We  thus  often  arrange  for 
light  sources  to  be  to  the  left,  in  front  of  us.  While  we 
can  certainly  interpret  shading  in  piciures  where  the  light 
source  is  not  in  this  standard  position,  there  seems  to  be 
a larger  possibility  of  depth  reversal  in  that  case, 
particularly  if  the  object  has  a complex,  unfamiliar  shape. 

Returning  now  to  the  specification  of  the  position 
of  the  light  source,  we  find  two  identities  that  will  be 
helpful  later. 

cos(0— 0O>  - (PJ  + qji)  / ( ylp02+q0l  ] 

PoP  + W-  tan  9 tan  #o  cos(0-0o) 
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Figure  10:  Side-view  of  a hill  cut  by  inclined  planes. 

Viewed  from  above,  the  lines  of  intersection 
crowd  together  where  the  surface  slopes  away 
from  the  equivalent  source,  while  there  are 
no  lines  where  the  terrain  surface  is  parallel 
to  the  inclined  planes. 


It  also  follows  that  the  slope  of  the  surface  in  the 
direction,  ( pQ , q0),  away  from  the  light  source  is, 

s =-  tan  9 cos(0— 0O). 


The  vector  ( — /»0. — 90.1>  is  perpendicular  to  the  inclined 
planes.  Ordinary  contours  represent  the  locus  of  the 
solution  of  z(x,y)  — z0,  while  inclined  contours  are  the 
loci  of  solutions  of  the  equation, 

(~Vc-?o>’+z<*^)l  / VH-/»o2+?02  - f'o 

We  can  now  apply  our  analysis  of  the  contour 
density  model  to  the  modified  surface,  z'(x,y),  defined  by 
the  left  hand  side  of  this  equation!  All  we  need  are  the 
slopes  of  this  new  surface.  Differentiating  the  above 
expression  with  respect  to  x and  y,  we  get, 

P'  = - P0 ) / VlW+Vo2 

f = (?  “ *o)  / Vl4V+?02 


Finally  then. 


K(P.  q) 


'I(P-Po)1  + (»-Vo>2 
rw  — (b/ki)  (rw—rb)  , , — 

Vl TpJ+1? 


We  obtain  the  expression  for  contour  density,  derived 
earlier,  when  pQ=qo=0.  Also,  in  the  special  case  that 
r^—O.  rw—  1.  pQ~*  1/V2,  and  q0=—l/j2, 

R(p.q)  - 1 - (b/ki)  V(/»— 1/V2)2  + (?+l/V2)2  /V? 


Tanaka’s  Orthographical  Relief  Method  E 

A method  proposed  by  Tanaka  in  1930  [10,11], 
involves  drawing  the  lines  of  intersection  of  the  surface 
with  evenly  spaced  inclined  planes.  These  planes  are 
oriented  so  that  their  common  normal  points  towards  an 
equivalent  light  source  (see  Fig.  10).  Thus  slopes  tilted 
away  from  this  direction  have  contours  spaced  closely, 
giving  rise  to  heavier  shading  than  that  on  horizontal 
surfaces,  while  surfaces  lying  parallel  to  the  inclined 
planes  are  lightest.  As  in  Lehmann’s  method,  some 
information  may  be  conveyed  by  the  directional  texture 
of  the  contours.  Here  we  concentrate  on  the  average 
reflectance  only. 

A contour  is  the  intersection  of  the  terrain’s  surface 
2 » 2(x.y)  with  a plane.  The  equation  z — z0  applies  to 
a horizontal  plane  appropriate  for  ordinary  contours.  For 
"inclined  contours"  an  inclined  plane  is  used  with 
equation  of  the  form 


(-Pq  -1 0-*)  ’ <*<**)  / Vl+/»o2+9o2  " z'o 


It  is  sometimes  useful  to  express  the  apparent  brightness 
as  a function  of  the  azimuth  <t>  and  zenith  angle  9 of  the 
surface  normal.  If  we  let  q>0  be  the  azimuth  and  0o  the 
zenith  angle  of  the  normal  to  the  inclined  planes,  then 
the  formula  can  be  rewritten  as  follows, 

lf(0,q>)  = rw~(b/kS)  (rw-rb)  cos  60 

yj tan2 9—2  tan  0tan0o  cos(0-0o)+tan20o 

When  90  <m  45°,  r^  = 0 and  rw  » 1,  then,  as  Tanaka 
show'ed  [10,11  J, 

V 1 — sin  29  cos(0— 0.) 

/T(ft0)  - l _ (b/ki)  _ . 0 


(V2  cos  6) 
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How  does  one  choose  the  parameter  ( b/kb)l 
Tanaka  felt  that  the  shading  produced  by  his  method 
should  match  that  seen  on  a surface  covered  with  an 
ideal  material  called  a perfect  diffuser.  The  apparent 
brightness  of  such  a surface  varies  with  the  cosine  of  the 
incident  angle,  between  the  surface  normal  and  a vector 
pointing  at  the  light  source.  He  introduced  a parameter 
called  the  line  factor  that  is  the  ratio  of  the  width  of  the 
inked  line,  b,  to,  kb/sin  60,  the  interval  between  inclined 
contours  for  a horizontal  surface.  The  line  factor  is  just, 

(b/kb)  V Pcf+lo1  / V l+p02-H02 

Tanaka  proposed  varying  the  line  width  b in  order  to 
produce  shading  that  matches  that  seen  on  a perfect 
diffuser,  but  realized  the  impracticality  of  this  approach 
for  all  but  polyhedral  surfaces  (10,11].  Resigned  to  using 
a fixed  line  width,  he  chose  to  optimize  the  line  factor  by 
considering  the  brightness  distribution  on  a spherical  cap 
extending  to  45°  slope.  With  the  source  at  45°  elevation, 
the  least  deviation  from  the  brightness  distribution  one 
would  see  if  the  surface  was  a perfect  diffuser  is  obtained 
when  the  line  factor  equals  0.3608.  Consequently,  ( b/kb ) 
= .3608  V2.  Finally  then, 


/?(/>,  q)  = 1 - .3608  V (p-l/VJ)2  + OH-1/V2)2 

It  is  unfortunate  that  this  method  later  gave  rise  to  some 
misunderstanding  as  well  as  a less  rigorous  hybridized 
form  [15]. 


Block  Diagrams 

A common  representation  for  relief  form  is  the 
block  diagram,  an  oblique  view  of  a series  of  equally- 
spaced,  vertical  profiles  [100—104],  The  projection 
typically  is  orthographic,  although  at  times  a perspective 
projection  is  utilized.  Surfaces  not  visible  to  the  viewer 
are  eliminated  (see  Fig.  11).  Shading  can  of  course  be 
applied  to  oblique  views  as  may  be  done  in  sophisticated 
flight  simulatois  of  the  future.  We  concentrate  here  on 
map  forms  that  provide  for  superposition  of  planimetric 
information  however,  and  digress  only  to  point  out  that 
part  of  the  appeal  of  block  diagrams  lies  in  their  implicit 
shading,  due  to  the  variation  in  the  spacing  of  lines. 

Following  the  discussion  in  the  last  section,  it  is 
clear  that  the  equivalent  light-source  position  is  in  the 
horizontal  plane  at  right  angles  to  the  vertical  cutting 
surfaces.  The  analysis  in  the  previous  section  then 
applies  directly.  Things  are  a little  more  difficult  if  the 
result  is  to  be  expressed  in  terms  of  the  coordinate 
system  of  the  surface  rather  than  one  oriented  with 
respect  to  the  viewer. 

We  can  analyze  the  shading  apparent  in  block 
diagrams  by  calculating  the  spacing  between  lines  as  a 
function  of  the  surface  orientation.  Let  a local  surface 
normal  be  n = (— p,— q.l).  A series  of  parallel  planes, 
with  common  normal  s,  cuts  the  terrain  surface.  The 
intersections  of  these  planes  with  the  surface  are  viewed 
from  a direction  specified  by  the  vector  p.  It  is  assumed 
that  the  viewer  is  at  a great  distance  so  that  the  profiles 
are  projected  orthographically  along  lines  parallel  to  p 
(see  Fig.  12). 

The  line  of  intersection  of  one  of  the  cutting  planes 
with  the  local  tangent  plane  will  be  parallel  to  the  vector 
n X S,  since  the  line  lies  in  both  planes  and  is  therefore 
perpendicular  to  the  normals,  n and  $.  Now  construct  a 
plane  through  the  line  of  intersection  and  the  viewer. 


Figure  11:  "Block-diagram"  representation  of  terrain 
surface.  This  is  an  isometric  projection  of  a 
series  of  uniformly  spaced  vertical  profiles  of 
the  surface  viewed  from  the  South-East. 
Note  the  shading  effect  due  to  the  variation 
in  line  spacing. 
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f.-LINE  OF 


Successive  cutting  planes  will  intersect  the  tangent  plane 
in  parallel  lines.  These  give  rise  to  parallel  viewing  planes 
corresponding  to  different  values  of  the  constant  ce.  The 
spacing  of  these  viewing  planes  is  of  interest,  since  it 
equals  the  spacing  of  the  lines  in  the  orthographic 
projection.  The  plane  corresponding  to  the  value  cg  + 
dcg  is  separated  from  the  plane  corresponding  to  the 


•y.'OjsXi 


INTERSECTION  value  cg  by  a distance  of  dcg/e,  where  e is  the  magnitude 
of  the  vector  e.  In  order  to  relate  the  spacing  of  lines  in 
v the  block  diagram  to  the  spacing  of  the  cutting  planes  we 

need  to  find  the  relationship  between  dcg  and  dcs- 

A point  v on  the  line  of  intersection  lies  in  all  three 
planes  and  therefore  simultaneously  satisfies  the  three 
equations  given  above  for  these  planes.  Expanding  the 
last  one  of  these,  e-v  = Cg,  we  obtain, 


Figure  12:  The  viewing  plane  contains  the  viewer  and 
the  line  of  intersection  of  the  slicing  plane 
with  the  terrain  surface.  Line  spacing  in  the 
block-diagram  equals  the  spacing  between 
successive  viewing  planes.  The  dotted  line  is 
parallel  to  the  vector  p. 


This  plane,  called  the  viewing  plane,  contains  both  n X * 
and  p.  The  normal  e of  the  viewing  plane  must  therefore 
be  perpendicular  to  both  and  can  be  defined  as. 


(n  X *)  Xp 


e =»  (n*p)  s — (*-p)  n 


If  we  let  v = (x,yj),  then  the  equation  for  the  local 
tangent  plane  can  be  written, 


for  some  value  of  the  constant  c..  Similarly,  the  equation 


of  a particular  cutting  plane  is. 


Different  values  of  cs  correspond  to  different  cutting 
planes.  The  plane  corresponding  to  the  value  ct  + dcs  is 
separated  from  the  plane  corresponding  to  the  value  cs  by 
a distance  dCj/t,  where  t is  the  magnitude  of  the  vector 
».  The  equation  for  the  viewing  plane  is  just, 

••v  — c# 


(n»p)  (s*v)  - (s-p)  (n-v)  — cg 

or,  (n.p)  cs  - (s-p)  c„  = cg 

Here,  cn  is  fixed  and  so  the  relationship  between  changes 
in  cg  and  cs  is  simply 

dcg  = (n-p)  dcs 

If  the  interval  between  cutting  planes  is  b and  the  map 
scale  is  k,  then  dcs/s  = (kb).  Consequently  the  spacing 
between  lines  in  the  block  diagram,  dcg/e  is, 

d = kb  (n*p)  (s/e) 

where  e is  the  magnitude  of  the  vector  e = (n-p)  s — 
(s*p)  n.  Finally,  we  remember  that 

R(p,q)  =»/•*,-  (rw  - rb)  (b/d) 

where  b is  the  thickness  of  the  lines.  Thus, 

Rip.q)  -rw-  ( b/kb ) (rw  - rb)  (e/s)  (1/n-p) 

The  view  vector  is  tangent  to  the  surface  when  n*p  «=  0. 
When  this  dot  product  becomes  negative,  the  surface  is 
turned  away  from  the  viewer  and  should  not  be  visible. 
Also  note  that  d **  kb,  when  s*p  «•  0.  One  should 
therefore  choose  s and  p so  that  they  are  not  orthogonal, 
to  avoid  getting  only  evenly  spaced  parallel  lines. 

In  the  case  of  perspective  projection,  line  density 
will  increase  with  distance,  and  the  resulting  reflectance 
will  be  lowered  because  of  a change  in  the  effective  scale 
factor  k.  If  the  projected  profiles  are  plotted  on  a raster 
device,  one  has  to  also  take  into  account  the  fact  that 
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the  number  of  dots  per  unit  line  length  is  not  constant. 
The  dot  density  varies  as  max[|cos  0|,|sin  0\  ],  where  0 
is  the  angle  between  the  line  and  the  direction  of  the 
raster.  This  variation  should  be  included  if  an  accurate 
reflectance  map  is  to  be  derived  for  output  of  this  form. 

Isometric  Views  of  Vertical  Profiles 

The  transformation  between  the  terrain  coordinate 
system  and  that  of  an  observer  viewing  the  terrain 
obliquely  can  be  found  by  multiplying  a rotation  matrix 
corresponding  to  rotation  by  dp  about  the  jr-axis  with  a 
matrix  corresponding  to  rotation  by  ( ir/2-fdy)  about  the 
z-axis,  where  <t>p  is  the  azimuth  and  Op  is  the  zenith 
angle  of  the  direction  specified  by  the  vector  p.  If  the 
coordinates  in  the  observer’s  system  are  x ',  /,  and  z',  one 
finds, 

x'  = —sin  <t>p  x 4-  cos  <t>p  y 

y'  = —cos  <t>p  cos  Op  x — sin  <t>p  cos  0p  y 4-  sin  Op  z 
z'  = 4-cos  <t>p  sin  Op  x r -f  sin  <t>p  sin  8p  y 4-  cos  Op  z 

In  the  case  of  orthographic  projection,  the  values  of 
and  / are  simply  multiplied  by  the  map  scale  k,  to 
determine  coordinates  in  the  block  diagram. 

The  general  formula  derived  in  the  last  section 
applies  to  all  combinations  of  viewpoint  and  cutting  plane 
orientation.  It  is  interesting  to  look  at  a few  special 
cases  however.  We  can,  for  example,  check  our  earlier 
result  for  the  contour  interval  in  an  ordinary  contour 
map.  Here  n = (—p,—q,  1),  as  always,  and  s = (0,0,1), 
since  we  are  considering  the  intersection  of  the  surface 
with  horizontal  planes.  Further,  p = (0,0,1)  since  the 
viewer  is  vertically  above  the  surface.  Here  then  s — 1, 
n-p  1,  and  e = (p,q, 0)  The  line  interval  is  therefore, 

d - m/'htw 

The  same  reflectance  map  is  obtained  as  before.  Slightly 
more  complicated  is  the  case  of  Tanaka’s  inclined 
contours  where  t = (— p0,— q0,l).  Here,  again,  n-p  = 
1,  while, 

c - (p—p0,  q—qo,0)  and  s - Vl+P02+*o2 
The  line  interval  is  therefore. 


d = (*#)  [Vl+/>02+?02  / >l(p—p0)2  4-  (q—qc)2  ] 

A result  leading  to  the  same  reflectance  map  as  the  one 
derived  before. 

Finally,  consider  profiles  running  West  to  East,  that 
is,  s = (0,1,0).  The  resulting  traces  may  be  viewed 
isometrically  from  the  South-East,  a fairly  common 
arrangment  for  a block  diagram.  Then  p = (1,— 1,1). 
Consequently,  n-p  = (1— p+<?)  and  s-p  = —1.  Further, 
e = (— p,l—  p.l)  and  hence, 

d = ( kb ) (1  -p+q)  / (JI  sjl-p+p2  ) 

So,  if  /-£  = 0 and  rw  = 1, 

R(p,q)  =*  1 — & ( h/ki ) yjl—p+p2  / (1  — p 4-  ?) 

Similarly,  for  profiles  running  South  to  North,  s =<* 
(1,0,0),  and, 

R(P.q)  — l — <Ii,  ( b/ki ) yj  1 +q+q2  / (1  — p 4-  q) 

At  times  two  orthogonal  sets  of  slicing  planes  will  be 
used,  producing  a mesh  on  the  surface.  The  reflectance 
map  corresponding  to  this  case  can  be  found  by  adding 
the  last  two  formulas  and  subtracting  one  from  the 
result. 


Wiechel’s  Contour-Terrace  Model  F 

Imagine  a three-dimensional  model  of  the  terrain 
built  by  stacking  pieces  of  some  material  cut  according  to 
the  shape  of  the  contours  on  a topographic  map  [8],  If 
the  thickness  of  the  material  is  chosen  correctly  the 
model  will  be  a scaled  approximation  of  the  terrain, 
looking  a little  like  a tiered  cake.  Illuminating  jthis 
construction  with  a distant  point  source  will  give  rise  to  a 
form  of  shading  since  each  contour  "terrace"  casts  a 
shadow  on  the  one  beneath  it  (see  Fig,  13).  Wiechel  [8J 
was  the  first  to  analyze  the  reflectance  properties  of  such 
a surface.  In  order  to  calculate  the  average  brightness  of 
a portion  of  the  model,  when  viewed  from  above,  we 
must  determine  the  width  of  the  shadow  relative  to  the 
w-idth  of  the  terrace. 

The  width  of  the  shadow,  measured  perpendicular 
to  the  contours,  varies,  depending  on  the  orientation  of 
contours  relative  to  the  direction  of  the  rays  from  the 
source.  For  example,  when  measured  this  way,  the  width 
is  zero  where  the  contour  is  locally  parallel  to  the 
projection  of  the  rays  on  the  jry-plane.  Measured  in  a 
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Figure  13:  Shadows  cast  in  the  contour  terrace  model. 

The  width  of  the  shadows,  measured 
perpendicular  to  the  contours,  varies  with  the 
direction  of  the  contours  relative  to  the 
direction  of  the  incident  rays. 


s- 


Figure  14:  Section  of  the  contour  terrace  model  in  a 
vertical  plane  containing  the  light-source. 
The  width  of  the  shadow,  b,  measured  in  this 
plane  is  constant,  while  the  width  of  the 
terrace,  b + w,  depends  on  the  slope  of  the 
surface  in  a direction  parallel  to  the 
projection  of  the  incident  rays  on  the  ground 
plane. 


vertical  plane  containing  the  light  source  however,  the 
width  of  the  shadow  is  constant,  since  the  terrace  has  a 
fixed  height  (see  Fig.  14).  If  the  light  source  has  a 
zenith  angle  0o,  the  contour  interval  is  S,  and  the  map 
scale  k,  then, 

tan  d0  = (b/kb) 

But,  tan  60  = yjp02  + q2 

To  calculate  the  average  brightness  we  must  know 
the  width,  d,  of  the  terrace  in  the  model,  measured  in 
the  same  vertical  plane.  The  slope  in  this  plane  evidently 
is  just 

s = -kb/d 

We  know  that  the  slope  of  a surface  in  the  direction 

(Po-  <7o>  is- 

* = (P0P  + mW/’o2  + <702 

Solving  for  d from  the  last  two  equations  and  for  b from 
the  two  before  them,  we  get 

b/d  = -(PqP  + q0q) 

For  example,  when  the  local  surface  normal,  (—p,—q,  1), 
is  perpendicular  to  the  direction  to  the  source,  (—p0,— 
<70,1),  their  dot-product  is  zero  and  b/d=l.  The  terrace  is 
then  covered  exactly  by  the  shadow.  In  the  above 
expression  both  the  contour  interval  and  the  map  scale 
have  cancelled,  as  one  might  have  predicted. 

When  (PqP  + qrf)  < —1,  shadows  coalesce  and  no 
further  increase  in  b/d  is  possible.  When,  on  the  other 
hand,  (p^p  + qrf)  > 0,  the  slope  is  facing  towards  the 
light  source.  This  means  that  no  shadow  is  cast.  In  this 
model,  shading  only  occurs  on  slopes  facing  away  from 
the  source,  while  those  facing  towards  it  are  all 
uniformly  bright.  This  is  certainly  not  what  one  would 
expect  of  a real  surface  and  suggests  that  the  contour- 
terrace  model  has  some  shortcomings.  This  is  not 
surprising  since  apparent  brightness  depends  on  surface 
orientation,  not  height,  and  while  the  model  represents 
height  with  reasonable  accuracy  it  does  a poor  job  of 
modeling  surface  orientation.  Indeed  the  surface  of  the 
model  is  mostly  horizontal,  with  some  narrow  strips  of  a 
vertical  orientation.  The  latter  are  not  even  visible  from 
above. 

Wiechel  noted  that  light  would  be  reflected  from 
these  vertical  surfaces  onto  the  terraces  [8].  The  surface 


1 1 


98 


thus  appears  brighter,  viewed  from  above,  near  vertical 
surfaces  facing  towards  the  light  source.  He  made  the 
simplifying  assumption  that  reflection  produces  uniformly 
bright  patches  with  the  same  shape  as  shadows  that 
would  be  cast  were  a source  to  be  placed  opposite  the 
actual  light  source.  This  is  not  a reasonable  assumption 
unless  the  vertical  surfaces  are  made  of  narrow  mirror 
facets,  each  oriented  perpendicular  to  the  direction  of  the 
incident  light!  In  this  case,  surfaces  illuminated  by 
reflection  as  well  as  by  direct  light  have  a brightness 
twice  that  of  those  illuminated  only  by  direct  light.  This 
version  of  the  model  is  fortunately  simple  enough  to  be 
amenable  to  analysis.  First  note  that,  if  we  assume  the 
surface  to  be  an  ideal  diffuser,  then  the  brightness  of 
horizontal  surfaces  that  are  neither  shadowed  nor 
illuminated  by  reflection  equals  the  cosine  of  the  zenith 
angle  of  the  source.  Therefore,  let  = 0 and  rw  = cos 
0O,  where, 

cos  eo  = 1 / Vi+/v+<?02 

R(p.<l)  * (1  + PoP  + gcg)  / >]\+p02+q02 
Or,  /f(0,0)  = [1  + tan  6 tan  0O  cos(0— 0O)]  cos  0O 

When  the  source  is  in  the  standard  position  (North-West 
at  45°)  this  becomes, 

R(p.g)  — (1  + lp-<r)/'fr]  / & 

Note  that  here  apparent  brightness  already  becomes  equal 
to  one  when  the  angle  of  inclination  is  about  30.36° 
towards  the  light  source.  This  may  be  contrasted  with 
the  case  of  the  ideal  diffuser,  to  be  discussed  later,  where 
it  reaches  one  only  for  an  inclination  of  45°.  Wiechel 
used  this  model  as  the  second  approximation  to  the  ideal 
diffuser  (the  first  will  be  discussed  later)  and  expressed 
his  result  as  (8], 

(cos  /'  / cos  e), 

where  i is  the  incident  angle,  and  e is  the  emittance 
angle,  here  equal  to  0.  These  angles  will  play  an 
important  role  in  the  discussion  of  more  recent  methods 
later  on. 

According  to  Raisz  and  Imhof  [1,28—30]  terraced 
contour  models  were  used  in  the  late  1800’s.  An  early 
example  is  an  alpine  excursion  map  published  in  1865 
that  employed  "contour  shadows"  [1],  The  first  attempts 
at  photography  of  obliquely  illuminated  surfaces  also  used 
terraced  terrain  models  [27],  Wiechel  probably  was 


influenced  by  these  early  efforts  when  he  chose  to 
develop  this  method  for  hill  shading. 

Wiechel’s  Helligkeitsmaassstab 

Wiechel  based  his  method  for  irregular  surfaces  on 
that  developed  earlier  by  Burmester  for  regular  surfaces 
[9],  In  order  to  make  his  approach  practical  he  needed  a 
graphical  device  for  translating  measurements  of  contour 
interval  and  direction  of  steepest  descent  into  gray  tones. 
The  "Helligkeitsmaassstab"  (his  spelling)  is  arranged  so 
that  these  measurements  can  be  transferred  directly,  and 
the  correct  tone  determined  from  a series  of  isophotes, 
contours  of  constant  brightness.  Steep  slopes,  with  small 
contour  intervals  correspond  to  points  near  the  origin  of 
this  diagram,  while  those  of  gentle  slope  map  into  points 
further  away. 

His  diagram  therefore  is  a sort  of  inside-out 
reflectance  map.  The  main  difference  is  that  radial 
distance  from  the  origin  in  gradient  space  is  proportional 
to  tan  0,  while  it  is  proportional  to  cot  0 in  this  early 
precursor.  This  corresponds  to  a conformal  mapping 
operation  referred  to  as  inversion  with  respect  to  the  unit 
circle.  Wiechel  showed  that  his  diagram  corresponded  to 
the  image  of  an  appropriately  illuminated  logarithmoid 
made  of  the  desired  material.  The  equation  of  this 
surface  is  z = — log  V *2  + y2  • We  saw  earlier  that 
the  reflectance  map  can  be  thought  of  as  the  image  of  a 
paraboloid. 

It  is  indeed  unfortunate  that  Wiechel’s  construction 
was  ignored.  Wiechel  developed  two  shading  methods 
that  did  not  require  this  two-dimensional  diagram.  In 
each  case  apparent  brightness  depended  only  on  the  slope 
of  the  surface  in  the  direction  away  from  the  light 
source.  This  property  manifests  itself  in  the  reflectance 
map  in  the  form  of  parallel  straight-line  contours.  The 
effect  is  less  apparent  in  Wiechel’s  diagram,  where 
isophotes  become  nested  circles  through  the  origin,  with 
centers  along  the  line  in  the  direction  of  the  light  source. 

Tanaka's  Relief  Contour  Method  G 

Tanaka,  in  1939,  developed  an  ingenious  method 
[24—26]  for  drawing  the  shadows  one  would  see  if  one 
looked  at  a contour-terrace  model.  His  method  is  based 
on  the  observation  that  the  length  of  the  shadow, 
measured  in  the  direction  of  the  incident  rays,  is 
constant.  Using  a pen  with  a wide  nib  one  can  trace  the 
contours,  while  maintaining  the  orientation  of  the  nib 
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parallel  to  the  direction  of  the  incident  rays  (as  in 
roundhand  writing).  Only  those  portions  of  the  contours 
are  traced  that  correspond  to  slopes  facing  away  from  the 
assumed  light  source.  Tanaka  used  black  ink  on  gray 
paper  for  reasons  that  will  become  apparent.  If  the 
reflectance  of  this  paper  is  rg  then, 

Rip.  q)  — rg  + (rg  — rb>  (PoP  + foff) 

provided  (p0p  4-  q0q)  < 0,  otherwise  Rip.  q)  = rg. 

Tanaka  also  came  up  with  a way  of  modulating  the 
average  reflectance  of  the  paper  in  areas  that 
corresponded  to  slopes  facing  towards  the  source.  His 
approach  is  somewhat  analogous  to  taking  the  negative  of 
a picture  of  the  contour-terrace  model  obtained  by 
illuminating  it  from  the  other  side.  Thus  white 
"shadows"  are  cast  in  the  opposite  direction  to  the  black 
shadows.  These  can  be  drawn  with  white  ink  on  gray 
paper  using  the  same  method  as  before  except  that  now 
the  section  of  the  contours  that  correspond  to  slopes 
facing  towards  the  light  source  are  traced.  It  is  easy  to 
see  that  the  resulting  average  reflectance  will  be, 

Rip.  q)  = rg  — (rg  — rw)  (/y>  4-  q0q) 

where  rw  is  the  reflectance  of  the  white  ink.  When  ip0p 
4-  q0q)  < 0,  no  "shadows"  appear  and  Rip.q)  *=  rg. 
Tanaka  combined  the  two  methods,  tracing  contours 
using  both  white  and  black  ink.  The  corresponding 
reflectance  map  Rip.q)  equals  one  of  the  expressions 
above  depending  on  whether  the  slope  locally  faces  away 
from  or  towards  the  assumed  source. 

He  apparently  also  experimented  with  nibs  of 
different  width  for  white  and  black  ink.  This 
corresponds  to  changing  the  elevation  of  the  assumed 
sources.  If  the  width  of  the  nib  is  b,  then  the 
relationship  is, 

ib/ki)  - tan  60  = V P02  + q02 

The  results  of  this  tedious  manual  method  are  most 
impressive  124—26],  One  can  write  the  above  expressions 
in  the  alternate  notation, 

R'id,<t>)  — rg  4-  irg—rb)  tan  9 tan  9o  «k(0~ 

when  cos(0— 0O)  < 0 

R‘i6.1>)  — rg  — irg—rw)  tan  6 tan  60  cos(0—  <pQ) 

when  cos(0— 0O)  > 0 


Tanaka  preferred  a reflectance  for  the  gray  background 
halfway  between  that  of  the  black  ink  and  the  white  ink. 
Placing  the  light  source  in  the  standard  position  we  get, 

Rip.q)  = (1+  ip-q)/f2)  / 2 

Or,  R'iO,<t>)  = [1  4-  tan  6 cos(0— 0O)]  / 2 

This  result  can  also  be  expressed  as,  (cos  i cos  g)  / cos  e, 
where  g is  the  phase  angle,  here  equal  to  0o.  Note  that 
except  for  scaling  by  cos  g,  this  is  the  same  result  as  that 
obtained  by  Wiechel  for  his  contour-terrace  model.  One 
effect  of  this  scaling  is  that  apparent  brightness  rises  to 
one  only  when  the  angle  of  inclination  is  45°,  on  the 
other  hand,  horizontal  surface  now  have  a gray  value  of 
only  0.5. 

Tanaka's  Hemispherical  Brightness  Distribution 

Tanaka  needed  a way  to  display  the  dependence  of 
tone  on  surface  orientation  to  permit  comparison  of  the 
results  produced  by  his  two  methods  and  what  would  be 
seen  if  the  surface  modeled  were  an  ideal  diffuser.  He 
chose  an  oblique  view  of  the  brightness  distribution  on  a 
spherical  cap  extending  to  45°  inclination  [10,11,24—26]. 

If  the  cap  is  increased  until  it  is  a hemisphere,  one 
obtains  something  like  the  reflectance  map.  One 
difference  is  that  radial  distance  from  the  origin  in 
gradient  space  is  proportional  to  tan  0,  while  here  it  is 
proportional  to  sin  6.  Thus,  while  the  reflectance  map  is 
a central  projection  of  the  Gaussian  sphere  onto  a 
horizontal  plane,  this  is  a parallel  projection.  Put 
another  way:  we  are  dealing  here  with  an  image  of  a 

hemisphere,  while  the  reflectance  map  is  the  image  of  a 

paraboloid. 

Tanaka’s  oblique  views  of  the  distribution  of 
brightness  versus  surface  orientation  do  not  provide  the 
quantitative  information  available  in  a contour  - 

representation  such  as  Wiechel’s.  His  method  is 

nevertheless  very  helpful  and  it  is  unfortunate  that  few 
seem  to  have  paid  any  attention  to  it,  judging  by  the 
continued  use  cf  inappropriate  forms.  It  is  not 
uncommon  for  example  to  see  the  dependence  of  tone  on 
surface  orientation  shown  as  a curve  depending  on  one 
variable,  slope,  when  it  clearly  depends  on  two,  slope  and 
the  direction  of  steepest  descent,  or  equivalently,  the  two 
components  of  the  gradient. 
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Lambertian  Surfaces  H 

We  now  turn  from  graphical  methods  using 
variation  in  line  spacing  and  line  thickness  to  those 
utilizing  continuous  tone  or  halftone  techniques.  These 
are  often  based  on  a model  of  what  the  terrain  would 
look  like  were  it  made  of  some  ideal  material,  illuminated 
from  a predetermined  direction.  The  result  differs  from 
an  aerial  photograph,  since  no  account  is  taken  here  of 
varying  terrain  cover,  the  light  source  is  often  placed  in  a 
position  that  is  astronomically  impossible,  and  the  terrain 
model  has  been  smoothed  and  generalized.  Not  being 
like  an  aerial  photograph  is  an  advantage,  since  aerial 
photographs,  taken  with  the  sun  fairly  high  in  the  sicy, 
often  do  not  provide  for  easy  (monocular)  comprehension 
of  surface  topography. 

The  amount  of  light  captured  by  a surface  patch 
will  depend  on  its  inclination  relative  to  the  incident 
beam.  As  seen  from  the  source  the  surface  is 
foreshortened,  its  apparent  (or  projected)  area  equal  to 
its  true  area  multiplied  by  the  cosine  of  the  incident 
angle.  Thus  the  irradiance  is  propoitiona!  to  cos  /. 
Strangely,  it  is  commonly  assumed  that  the  radiance 
(apparent  brightness)  of  the  surface  patch  is  also 

proportional  to  cos  /.  This  is  generally  not  the  case  since 
light  may  be  reflected  differently  in  different  directions, 
as  can  be  seen  by  considering  a specularly  reflecting 
material. 

One  can  however  postulate  an  ideal  surface  that 
reflects  all  light  incident  on  it  and  appears  equally  bright 
from  all  viewing  directions.  Such  a surface  is  called  an 
ideal  diffuser  or  Lambertian  reflector  and  has  the 

property  that  its  radiance  equals  the  irradiance  divided  by 
n (141,142].  In  this  special  case  the  radiance  is 

proportional  to  the  cosine  of  the  incident  angle.  No  real 
surface  behaves  exactly  like  this,  although  pressed 
powders  of  highly  transparent  materials  like  barium 
sulfate  and  magnesium  carbonate  come  close.  Matte 
white  paint,  opal  glass,  and  rough  white  paper  are 

somewhat  worse  approximations  as  is  snow  (IX)].  Most 
proposed  schemes  for  automatic  hill-shading  are  based  on 
models  of  brightness  distribution  on  ideally  diffusing 
surfaces  [8,10,11,51—59,74,76,77],  even  though  there  is  no 
evidence  that  perception  of  surface  shape  is  optimized  by 
this  choice  of  reflectance  model.  As  we  will  see, 
reflectance  calculations  based  on  this  model  are  not 
particularly  simple  either. 

The  cosine  of  the  incident  angle  can  be  found  by 
considering  the  appropriate  spherical  triangle  (see  Fig.  15) 
formed  by  the  local  normal,  N,  the  direction  towards  the 
source,  S,  and  the  vertical,  V.  One  then  finds,  as  Wiechel 
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Figure  15:  Spherical  triangle  used  in  calculating  the 
incident  angle,  /',  from  the  azimuth  and 
elevation  of  the  light-source  and  the  azimuth 
and  elevation  of  the  surface  normal.  The 
direction  towards  the  viewer  is  V,  the 
direction  to  the  source  is  S,  while  the  surface 
normal  is  N. 


already  showed  (8], 

= cos  80  cos  6 + sin  0o  sin  0 cos(0—  0O) 

Alternatively  one  can  simply  take  the  dot-product  of  the 
unit  vector,  N,  normal  to  the  surface  and  the  unit  vector, 
S,  pointing  towards  the  source  [137,139], 

(-p.-q.l)  • (-P0.-V0.l) 

cos  i =»  — 

(Vl Vl+/»o2+V> 


The  reflectance  map  (normalized  so  that  its  maximum  is 
one)  then  is 

R(P-9)  - 0+/>oP+4<tf)  / (Vl+^+v2  >Jl+p02+q02) 

When  (l+Po/7+^,0)  < 0 the  surface  element  is  turned 
away  from  the  source  and  is  self-shadowed.  In  this  case, 
R(p.q)  - 0. 
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In  the  case  of  a point  source  of  light  at  45°  zenith 
angle  in  the  North-West,  the  reflectance  map  becomes 

R(p.9)  - 11  + 0 >-*)/£]  / It/5  Vi+^+v2  ] 


Peucker’s  Piecewise  Linear  Approximation  I 

The  computation  of  gray  value  using  the  equation 
for  the  cosine  of  the  incident  angle  is  complicated  and 
slow  because  of  the  appearance  of  the  square  root. 
Peucker  [63]  experimented  with  a number  of 
approximations  that  are  easier  to  compute.  He  found 
that  an  adequate,  piecewise  linear  approximation  for 
slopes  less  than  one,  is 

.3441  p — .5129  q + .6599  for  p+q  > 0 
.5129  p — .3441  q ■+■  .6599  for  p+q  < 0 

or,  R(p,q)  - .4285  (p-q)  - .0844  |p+</|  + .6599 

where  \p+-q\  denotes  the  absolute  value  of  (p+q).  The 
above  approximation  produces  excellent  shaded  overlays, 
that  in  fact  seem  easier  to  interpret  that  those  produced 
using  the  exact  equation  for  a perfectly  diffusing  surface. 

Brassel’s  Adjustment  Of  Light  Source  Position  J 


interpret  correctly  the  shape.  Brassel  ameliorated  this 
effect  by  reducing  the  elevation  of  the  light  source  in 
regions  where  this  problem  occurred. 

If  the  zenith  angle  of  the  source,  ft0,  is  smaller  than 
the  zenith  angle  of  the  direction  defined  by  the  surface 
normal,  ft  he  moves  the  source  to  a new  zenith  angle,  0„, 
that  is  a weighted  average  of  ft0  and  ft.  To  be  precise, 

6n  «■  max  [ft0,  aft  + (1— ajftj 
where,  ft  — tan'1  V P*+9 2 

In  his  thesis  [73],  the  weighting  factor  a was  one,  so 
that  adjustment  in  elevation  was  complete.  Curiously, 
this  simple  method  has  the  effect  of  lowering  the  light 
source  even  for  surface  elements  tilted  away  from  the 
source,  as  long  as  the  slope  is  large  enough.  The  above 
method  can  also  be  expressed  directly  in  terms  of  the 
components  of  the  gradient.  When  p^+q2  > P02+q02, 

Pn  - P0  (V/fV  / 

= ?o 

where  pn  and  qn  are  the  components  of  the  gradient  of  a 
surface  element  oriented  to  be  maximally  illuminated  by 
the  adjusted  light  source.  If  there  are  no  further 
adjustments  of  source  position,  the  reflectance  map  in  the 
specified  region  becomes, 


Perhaps  the  most  outstanding  examples  of  shaded 
maps  come  from  Switzerland.  Techniques  for  portraying 
the  shape  of  the  surface  and  integrating  this  information 
with  planimetric  detail  have  been  perfected  by  a number 
of  artists  there  [1,42—50].  The  results  of  automated 
methods  as  described  here,  cannot  compete  with  the 
beauty  of  their  products.  Nevertheless,  automated 
methods  do  provide  a systematic,  accurate  way  for 
generating  shaded  overlays.  They  will  become  of 
particular  importance  when  good  digital  terrain  models 
become  easily  available.  Brassel  attempted  to  incorporate 
as  much  as  possible  of  the  Swiss  manner  into  his  program 
[73—77].  He  quickly  realized  two  problems  with  methods 
based  purely  on  Lambertian  reflectance  models. 

The  first  effect  is  explained  as  follows.  Surface 
elements  sloping  away  from  the  source  are  dark,  while 
those  tilted  towards  the  source  are  brighter.  Brightest 
are  those  that  have  the  light  rays  falling  perpendicularly 
on  the  surface.  Surface  elements  sloped  more  steeply 
however,  become  darker  again.  This  lack  of 
monotonicity  of  brightness  with  slope  is  apparently 
disturbing  and  reduces  the  ability  of  the  observer  to 


Rip,  q) 


[1  + (PoP-H (ft)  0 Ip*W  / W+fo2)! 


Adjustment  of  the  Azimuth  of  the  Source 

Next,  Brassel  observed  that  ridge  and  stream  lines 
become  indistinct  when  their  direction  was  more  or  less 
aligned  with  a direction  toward  the  source.  Opposite 
faces  of  a mountain  or  valley  may  end  up  with  similar 
gray  values  when  the  cosine  of  the  incident  angle  is 
similar  for  the  two,  even  though  they  have  quite  different 
surface  orientations.  Maximum  contrast  occurs  when  a 
linear  feature  lies  at  right  angles  to  the  direction  of  the 
incident  light,  and  Brassel  therefore  moves  the  light 
source  in  azimuth  towards  the  local  direction  of  steepest 
ascent  or  descent  (whichever  is  closer). 

The  amount  of  adjustment  depends  on  two 
parameters  (see  Fig.  16).  The  maximum  amount  of 
adjustment  is  specified  by  w (55s  for  example),  while  the 
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Figure  16:  Sawtooth  function  giving  adjustment  of 
azimuth  of  the  light  source  as  a function  of 
the  angle  between  "regional"  ridge  and  valley 
directions  and  the  direction  of  the  light 
source  in  Brassel's  scheme. 


azimuth  difference  at  which  this  maximum  occurs  is 
specified  by  g (80°  for  example).  The  details  of  the 
computation  are  not  very  important  but  are  given  here 
for  completeness.  First,  the  azimuth  of  the  direction  of 
steepest  ascent  is  computed  using 

0 = atan(—  q,— p) 

where  atan(.y..r)  is  the  direction  of  the  line  from  the 
origin  to  the  point  (x.y)  measured  counter-clockwise  from 
the  jr-axis.  Next,  the  difference  between  0 and  the 
azimuth  of  the  source,  0O,  is  reduced  to  the  range  —n/2 
to  + ir/2  by  adding  or  subtracting  integer  multiples  of  »r. 
Let  the  result  be  A0.  The  adjusted  azimuth  of  the 
source  is  then  calculated  as  follows, 

*n  “ *o  + 

w sign(A0)  min  [|A0|/g,(  ir/2— | A0 1 )/( ir/2— g )] 

Where  sign(A0)  is  +1  when  A0  > 0,  and  —1  when  A0 
< 0.  Now  one  can  calculate  the  gradient  (p„.  <jn)  of  the 
maximally  illuminated  surface  clement,  or  instead,  use 
Wiechel’s  formula  to  get  the  cosine  of  the  incident  angle 
directly, 

AT 0,0)  — cos  0n  cos  0 ■+■  sin  6„  sin  6 cos(0— 0„) 


Here  it  should  be  pointed  out  that  in  Brassel’s 
scheme  the  gradient,  (p,q),  used  in  the  above  formulas 
for  adjusting  the  azimuth  of  the  source  is  a regional 
value  derived  from  ridge  and  stream  lines  in  the  area 
near  a particular  point.  In  this  way  the  cartographer  can 
influence  the  final  appearance  of  the  shaded  overlay  by 
altering  these  manually  entered  linear  features.  This 
method  involves  rather  complicated  global  calculations 
that  do  not  lend  themselves  to  implementation  in  the 
straightforward  way  we  have  discussed.  The  apparent 
brightness  of  a surface  element  depends  on  both  its 
orientation  and  some  function  of  its  surround. 

A possible  objection  to  this  idea  is  that  the 
distribution  of  light  sources  does  not  vary  from  place  to 
place  in  a real  imaging  situation  unless  the  sources  are 
very  close  to  the  surface.  It  must  be  pointed  out, 
however,  that  people  seem  to  have  little  difficulty 
interpreting  synthetic  images  where  the  assumed  light 
source  position  varies.  In  fact,  few  notice  such  drastic 
changes  in  assumed  light  source  position  as  are  apparent 
in  a recent  map  of  the  polar  regions  of  Mars  (147).  This 
may  be  related  to  the  fact  that  our  perception  of  shaded 
images  does  not  give  us  a good  appreciation  for  global 
differences  in  depth,  instead  giving  us  an  excellent 
appreciation  of  local  surface  orientation  patterns. 

Whatever  the  merits  of  this  argument,  the  above 
method  can  be  modified  to  fit  in  with  the  notion  of  the 
reflectance  map,  as  defined  earlier,  if  one  uses  the  local 
gradient  ( p,q ) in  the  calculation  of  the  adjusted  source 
position.  The  illustration  shown  here  uses  this  modified 
version.  Note  that  in  Brassel’s  scheme  the  adjustment  in 
azimuth  and  zenith  angle  of  the  source  are  independent 
and  can  be  carried  out  in  either  order. 

Brassel  also  adjusted  the  apparent  brightness 
according  to  the  height  of  the  terrain.  This  is  a simple 
local  computation  that  can  be  easily  added  to  any  of  the 
basic  methods  presented  here.  It  was  not  included  here 
to  simplify  comparisons. 

Alternate  Light  Source  Adjustment  Method  K 

Brassel  used  a piecewise  linear  adjustment  in 
azimuth.  A similar  effect  can  be  achieved  using  a 
smoothly  varying  function  like 

sin  40  — (0/2)  sin  2(0— 0O)  = 0 sin(0— 0^  cos(0— 0O) 
That  is, 

sin  40-0  ((Po<7-V>)  (PoP+w)]  / [(p2-!-*2)  (P02+*02)) 
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Adjusting  the  azimuth  of  the  source  by  60  leads  to  a 
new  position  specified  by, 

ps  — PQ  cos  60  — qQ  sin  60 
qs  — P0  sin  60  + q0  cos  60 

Adjustment  is  complete  for  small  angles  when  0—1. 
The  use  of  trigonometric  functions  is  avoided  in  the 
above  calculation,  since  both  the  sine  and  the  cosine  of 
60  can  be  computed  without  them. 

Next  we  turn  to  the  adjustment  in  the  elevation  of 
the  source.  To  avoid  the  peculiar  phenomena  of  the 
lowering  .->f  the  source  even  for  surface  elements  turned 
away  from  it,  we  adjust  the  elevation  according  to  the 
projection  of  the  surface  normal  on  a plane  containing 
the  source.  When  p^p+qtf  > p2+q2, 

Pn  - Ps  (/tyH-Vy?)  / (Ps2+q,2) 
q„  - qs  (PsP+4sP)  / (/>,2+f,2) 

In  this  region  then  the  reflectance  map  becomes, 

>fi  + (psp+q/i)2/(p,2+<i/) 

R(P.q)  - . 11  - 

Vl  + P2  + 91 

Otherwise  it  is  calculated  as  before,  that  is,  the  cosine  of 
the  incident  angle  is 

R(p.q)  = (\+P„P+qnq)  / (Vl+^-N2  >ll+P„2+qn2) 

The  advantage  of  the  above  method  of  adjustment  is  that 
simple  calculations  in  terms  of  the  components  of  the 
gradient  replace  trigonometric  equations  in  terms  of 
azimuth  and  zenith  angles. 


Witchers  First  Approximation  L M 

The  first  serious  analysis  of  an  approach  based  on 
the  shading  seen  on  the  surface  of  an  obliquely 
illuminated  matte  object  is  that  of  Wiechel  18].  He 
started  by  assuming  a perfectly  diffusing  surface  and 
proposed  connecting  points  of  equal  apparent  brightness 
by  isophotes.  He  correctly  determined  the  brightness  of 
a perfect  diffuser  as  already  mentioned.  In  order  to 
make  calculations  less  unwieldy  he  also  suggested  three 
approximations,  the  second  of  these  being  the  contour- 
terrace  model  already  discussed.  His  first  method 
involved  approximating  the  cosine  of  the  incident  angle,  /, 


Figure  17:  Projection  of  the  surface  normal  on  a vertical 
plane  containing  the  assumed  light-source. 
The  projected  normal  is  perpendicular  to  the 
line  in  which  the  plane  cuts  the  terrain 
surface. 


Figure  18:  Spherical  triangles  used  to  calculate  the 
projected  incident  angle,  and  the  projected 
surface  inclination,  The  direction 

towards  the  viewer  is  V,  the  direction  to  the 
source  is  S,  while  the  surface  normal  is  N. 
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by  the  cosine  of  the  projection  of  this  angle  onto  a 
vertical  plane  lying  parallel  to  the  rays  (see  Fig.  17).  By 
applying  the  analogue  formulas  to  the  lower  spherical 
triangle  (see  Fig.  18)  we  get, 

sin  /'  cos  i — cos  /'  sin  i cos  x 

Applying  the  analogue  formulas  next  to  the  whole 
triangle  we  get 

sin  / cos  x *■  cos  $ sin  0O  — sin  6 cos0o  cos  (0  — 0O) 

The  second  equation  allows  us  to  eliminate  x from  the 
first  and  obtain  an  expression  for  tan  i‘.  Using  the 
identity  cos  /'  — 1/VT+tan2  we  finally  find, 

K(9,q>)  — cos  i / [cos  0 Vl  + tan20cos2(0—  0O) ) 

where,  using  the  cosine  formula  as  before, 

cos  i ■■  cos  0 cos  0O  + sin  9 sin  0o  cos (0— 0O) 

Alternatively  one  can  project  the  vector  n — (—p,—q,  1) 
onto  the  plane  with  normal  s — (qo,—po,0).  The  result 
will  equal, 

it'  — n — (n  • s)  s/s1 

where  s is  the  magnitude  of  the  vector  s.  This  projected 
vector  will  be  perpendicular  to  the  line  in  which  a 
vertical  plane  including  the  light  source  cuts  the  surface. 

I-^o^o?>A^o2+?02)-  ^o<V*+M)/0’o2+?o2)-  *1 

Taking  the  dot-product  of  the  projected  vector  and  the 
vector  pointing  at  the  source,  then  dividing  by  their 
magnitudes  we  find, 

O+PoP+q <fl) 

R/p.q)  — — — 

(Vi+/»02-K>2  Vi  +0\/+W)2A*o2+*o2>  J 


While  these  equations  are  more  complicated  than 
the  original  equations  for  the  cosine  of  the  incident  angle, 
/,  it  must  be  pointed  out  that  the  angle  f can  be 
estimated  graphically  by  measuring  the  contour  interval 
in  a direction  parallel  to  the  incident  light  rays.  The 
same  is  true  of  Wiechel’s  second  approximation 
introduced  earlier.  This  greatly  simplifies  the  manual 
construction  of  shaded  maps  from  contour  maps,  and 
makes  it  possible  to  use  a simple  one-dimensional  scale 
for  brightness  instead  of  Wiechel’s  more  elaborate 
"Helligkeitsmaassstab".  This  property  manifests  itself  in 
the  reflectance  map  by  the  appearance  of  parallel  straight 
line  contours.  It  is  also  interesting  to  note  that  Wiechel’s 
"approximations"  produces  results  that  seem  better  than 
those  obtained  using  the  equation  for  the  perfect  diffuser. 
Unfortunately,  experimentation  at  his  time  was  limited 
because  of  the  lack  of  appropriate  technology  for 
systematically  generating  continuous  tone  patterns. 
Apparently  no  maps  made  by  this  method  were  ever 
published  [1]. 

Finally,  Wiechel  postulated  a material  that  would 
not  appear  equally  bright  from  all  viewing  directions,  but 
instead  had  brightness  varying  as  the  cosine  of  the 
emittance  angle.  This  was  used  in  part  to  discuss  the 
relationship  between  the  contour-terrace  model  and  the 
original  surface,  but  also  put  forward  as  a third, 
"modified  brightness"  model  that  might  be  used  in 
calculating  gray  tone.  In  this  case  brightness  varies  in 
proportion  to  (cos  » cos  e).  We  can  normalize  his  result 
here  by  dividing  by  the  maximum  of  this  product, 
cos2(;/2),  where  g is  the  so-called  phase  angle,  here 
equal  to  60  (The  term  phase  angle  stems  from  work  on 
lunar  photometry,  where  this  angle  equals  the  phase  of 
the  moon).  Then, 

R(p,q)  — 2 (cos  / cos  e)  / (1  4-  cos  g) 

- 2 (l+4/+frf)  / [d  + Vl+/>o2+*o2  > O+^+f2)) 

Incidentally,  this  function  does  not  satisfy  Helmholtz’s 
reciprocity  law  [123],  and  therefore  cannot  correspond  to 
the  reflectance  of  any  real  surface  illuminated  by  a point 
source. 


This  matches  the  expression  for  perfectly  diffuse 

reflection  for  values  of  (p,q)  along  the  line  from  the  Martik<f  Automatic  Relief  Shading  N 

origin  to  the  source  point  (pv  qj.  When  the  source  is  in 

the  standard  position  the  equation  becomes  Blachut  and  Marsik  further  modified  Wiechel’s 

_ I — approximation,  partly  as  a result  of  their  dissatisfaction 

R(P>q)  “ U + (P~~q)/ V2!  / IV2  Vl  + (p—q)  /2  ] with  the  fact  that  a horizontal  surface  does  not  appear 

white  when  a perfectly  diffusing  material  is  assumed 
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[60,6 1J.  This  may  have  stemmed  in  part  from  early 
conventions  in  map-making  where  horizontal  surfaces 
were  portrayed  without  hachures  [4—6].  Marsik  also 
aimed  for  simpler  calculations  and  considered  the  slope  in 
the  direction  towards  the  source.  For  some  reason,  he 
proposed  making  the  density  of  the  printed  result  equal  to 
the  tangent  of  the  projected  slope  angle  O'  (see  Fig.  17). 
Density  is  the  logarithm  (base  10)  of  the  reciprocal  of 
the  reflectance.  Applying  the  analogue  rule  to  the  upper 
spherical  triangle  (see  Fig.  18)  one  can  show  that, 

0 — cos  0 sin  O'  — sin  0 cos  O'  cos  (0  — 0^ 

Thus,  tan  O'  — tan  0 cos(0  — 0O), 

and,  R'(0,<t>)  - 10tan  ~ *o> 

Using  the  expression  for  the  projected  normal  n' 
developed  in  the  last  section,  or,  remembering  the 
expression  for  the  slope  in  the  direction  (p0,  q0),  one  can 
also  show, 

R(p,q)  - 10<V^O*WFo2-*V 


1 / [1  + (cos  e/cos  0] 


unless  cos  i < 0,  when  the  surface  is  self  shadowed. 
Here  / is  the  incident  angle,  and  e is  the  emittance  angle, 
the  angle  between  the  local  surface  normal  and  the 
direction  to  the  viewer,  here  equal  to  0.  The  expression 
equals  1/(1  + cos  g)  when  / -=  0,  where  g is  the  phase 
angle,  here  equal  to  0Q.  Using  this  value  for 
normalization  and  remembering  the  expression  for  cos  i 
one  finds, 

(1  + cos  0O) 

tf(ft0)  


U + 


cos  0 


J 


(cos  0 cos  0o+sin  0 sin  0o  cos (0-0o)) 


[J  + i/VhVhv” I 

R(pq)  

[1  + Vl+Fo2+?oJ  / (1+/W>+*o?>l 


When  PqP  + qtf  > 0,  R(p,  q)  > 1 and  so  all  surfaces 
facing  towards  the  light  source  are  white.  No 
information  is  available  to  the  viewer  regarding  surface 
shape  in  these  areas.  If  the  assumed  light  source  is  in  the 
standard  position  we  get  the  simple  formula, 

R(p,q)  - 100>“*)/V2 

Marsik  also  limited  the  density  to  a maximum  of  0.7  to 
avoid  interference  with  planimetric  information  on  the 
map. 

Lommel-Seeliger  Law  O 

Many  surfaces  have  reflectance  properties  that 
differ  greatly  from  those  of  an  ideal  diffuser.  The 
photometry  of  rocky  planets  and  satellites  has  intrigued 
astronomers  for  many  years  [120—126].  Several  models 
have  been  proposed  to  explain  the  observed  behavior. 
One  of  the  earliest,  developed  by  Lommel  [118]  and 
modified  by  Seeliger  [119],  is  based  on  an  analysis  of 
primary  scattering  in  a porous  surface  (123,127).  Their 
model  consists  of  a random  distribution  of  similar 
particles  suspended  in  a transparent  medium  and  results 
in  a reflectance  function  that  is  given  here  in  its  simplest 
form. 


unless  ( 1 +PQP+qcfl)  < 0.  when  R(p,  q)  — 0.  When  the 
source  is  in  the  standard  position, 

(1+1/V5)  [1  + (p-i)/{2)  / [(l-h/2)  + (p-q)/f2] 


The  Lommel-Seeliger  law  has  been  used  in  automated 
relief  shading  by  Batson,  Edwards  and  Eliason  [72]. 

Based  on  detailed  measurements  and  modeling, 
Fesenkov  [122,126]  and  later  Hapke  [127-129]  further 
improved  the  equations  for  the  reflectance  of  the  material 
in  the  maria  of  the  moon.  Hapke  imagined  the  surface 
as  an  open  porous  network  into  which  light  can  penetrate 
freely  from  any  direction.  His  result  has  three 
components:  the  Lommel-Seeliger  formula  for  reflection 
from  a surface  layer  containing  many  scattering  points  of 
low  reflectance,  SchOn berg’s  formula  [121]  for  reflection 
from  a Lambertian  sphere  and  a complicated  factor 
resulting  from  mutual  obscuration  of  the  particles.  The 
results  of  such  investigations  are  often  expressed  in  terms 
of  angles  other  than  the  ones  introduced  so  far. 


Luminance  Longitude  end  Luminance  Latitude 

Another  convention  for  specifying  the  orientation  of 
the  surface  element  relative  to  the  direction  of  a light 
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where  we  have  used  the  shorthand  notation,  I = cos  »,  E 
■ cos  r,  and  G cos  g.  These  results  can  also  be 
expressed  in  terms  of  the  components  of  the  gradient: 


So  tan  a is  simply  the  slope  in  the  direction  away  from 
the  source.  Now, 


1+2  IEG  — (12+E2+G2)  _ 

(loP-Potf  / lO+^+f2)  0+?o2+*o2>l 
l2  - 2 IEG  + E2  - 

KpoP+W)2  + (p02+*o2M  / lO+^-W2)  (l+P02+^o2>l 


The  Lommel-Seeliger  law  can  be  expressed  in  terms  of 
luminance  longitude  and  luminance  latitude  as. 


Figure  19:  Luminance  longitude  a and  luminance 
latitude  0 of  a surface  element  are  defined  as 
the  longitude  and  latitude  of  a patch  on  a 
sphere  with  the  same  orientation.  Longitude 
and  latitude  are  measured  relative  to  the 
luminance  equator  through  the  light  source  S 
and  the  viewer  V. 


cos(a+g)  / [cos  a + cos(a+g)] 


and  it  is  clear  from  this  form  that  scene  radiance  is 
independent  of  luminance  latitude.  This  simplifies  the 
problem  of  calculating  the  shape  of  the  lunar  surface 
from  shading  in  a single  image  [136,137]. 


source  and  the  viewer  has  become  established  in  the  work 
on  planetary  and  lunar  photometry.  Imagine  a sphere 
illuminated  by  a light  source  above  the  point  S,  viewed 
by  an  observer  above  the  point  V (see  Fig.  19).  These 
two  points  define  a great  circle  that  we  take  to  be  the 
equator.  Then,  points  on  the  sphere  can  be  referenced 
using  the  longitude,  a,  measured  from  the  point  V along 
the  equator,  and  the  latitude,  0. 

All  possible  surface  orientations  can  be  found  on 
the  sphere,  and  each  surface  orientation  can  be  identified 
with  some  point,  N say.  The  luminance  longitude  and 
luminance  latitude  corresponding  to  a particular  surface 
orientation  are  the  longitude  and  latitude  of  N.  It  is  not 
difficult  to  show  that. 


Minnaert's  Reflectance  Function 


Minnaert  discusses  a large  variety  of  models  for  the 
reflection  of  light  from  rough  surfaces  [125].  He  also 
proposed  a class  of  simple  functions  of  the  form, 


intended  to  fit  observations  of  the  radiance  of  lunar 
material  while  obeying  the  reciprocity  law  [123].  Here  k 
is  a parameter  to  be  chosen  so  that  the  best  fit  with 
experimental  data  is  obtained.  This  parameter  is  meant 
to  lie  between  zero  and  one,  with  the  above  expression 
becoming  equal  to  that  for  the  perfect  diffuser  when 
*=1.  We  can  normalize  this  expression  so  it  equals  one 
when  /=* 0, 


cos  e cos  0 cos  a and  cos  / = cos  0 cos (a+g) 


lO+Po/H-v^/O+^+f2)]*  O/l+^+f2  WT5FS7 ) 


[1+2  IEG  - (I2+E2+G2)]  / [I2  - 2 IEG  + E2] 
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Particularly  Simple  Reflectance  Maps  Q R 

Several  methods  discussed  here  have  reflectance 
depending  only  on  the  slope  in  the  direction  away  from 
the  assumed  light  source,  leading  to  parallel  straight  line 
contours  in  the  reflectance  map.  These  include  WiechePs 
first  and  second  "approximation",  Tanaka’s  relief  contour 
method,  the  "law”  of  Lommel  and  Seeliger,  Minnaert’s 
formula  when  a = 14,  as  well  as  Marsik’s  automatic 
relief  shading.  These  methods  are  quite  effective  in 
producing  overlays  that  are  easy  to  interpret.  One  can 
construct  more  such  reflectance  maps,  including  some 
that  are  even  easier  to  calculate.  One  possibility,  for 
example,  is, 

R(p.q)  = '/i  + Vi  (p‘  + a)/b 

where,  pf  - (p0p'+  q<fl)  / V/>02  + *0 

is  the  slope  in  the  direction  away  from  the  source. 
Values  less  than  or  equal  to  zero  correspond  to  black, 
while  values  greater  than  or  equal  to  one  correspond  to 
white.  The  parameters  a and  b allow  one  to  chose  the 
gray  value  for  horizontal  surfaces  and  the  rapidity  with 
which  the  gray  values  changes  with  surface  inclination. 
The  simple  program  shown  earlier  (see  Fig.  6)  uses  this 
form  with  a«0,  6—1/72  and  p0—  1/72,  $0=— 1/72. 

A simple  alternative,  somewhat  reminiscent  of 
Lehmann’s  approach,  is, 

R(p.q)  — Vl  + (1/*-)  tan  l[( n/2)(p'+a)/b] 

All  possible  slopes  are  mapped  into  the  range  from  zero 
to  one.  This  has  the  advantage  that  the  reflectance  does 
not  saturate  for  any  finite  slope  and  all  changes  of 
inclination  in  the  vertical  plane  including  the  source 
translate  into  changes  in  gray  level. 

Another  way  to  achieve  this  effect  is  to  use, 

R(p.q)  = Vi  + Vl  ( p'+a ) / V*2  + (P'+°)2 

These  three  formulas  are  given  in  a form  where  the  rate 
at  which  the  gray  value  changes  with  surface  inclination 
is  the  same  at  {ff+a)  — 0. 

Glossiness  - The  First  Off-Specular  Angle 

Not  all  surfaces  are  matte.  Some  are  perfectly 
specular  or  mirror-like.  Since  smooth,  specularly 
reflecting  surfaces  form  virtual  images  of  the  objects 


around  them,  patches  of  high  brightness  will  appear  when 
such  a surface  is  illuminated  by  an  extended  source,  like 
a fluorescent  light  fixture,  or  by  light  streaming  in 
through  a window.  The  size  of  the  patches  depends  on 
the  solid  angle  subtended  by  the  source  as  well  as  the 
surface  curvature,  while  the  brightness  distribution  is  that 
of  the  source. 

To  study  reflection  of  an  extended  source  in  a 
specular  surface,  it  is  useful  to  introduce  the  "off- 
specular"  angle,  s,  between  the  direction  S to  the  centei 
of  the  source  and  the  direction  S',  of  the  point  that  is 
specularly  reflected  to  the  viewer  (see  Fig.  20).  This, 
incidentally,  is  also  the  angle  between  the  direction  to  the 
viewer,  V,  and  the  direction,  V',  in  which  the  rays  from 
the  center  of  the  source  are  specularly  reflected. 

We  assume  a circularly  symmetric  source,  with 
brightness  L(s)  at  eccentricity  s.  This  is  the  brightness 
the  viewer  observes  in  the  specularly  reflecting  surface. 
Calculating  the  first  off-specular  angle  s is  simple  using 
the  appropriate  spherical  triangles. 


Figure  20.  Spherical  triangles  used  to  calculate  the  first 
off-specular  angle,  s.  It  is  the  angle  between 
S,  the  center  of  the  source,  and  S',  the 
direction  from  which  light  is  specularly 
reflected  towards  the  viewer.  Equivalently,  it 
is  the  angle  between  V,  the  direction  of  the 
viewer,  and  V',  the  direction  in  which  light 
from  the  center  of  the  source  is  specularly 
reflected. 
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cos  s * cos  2/  cos  g — sin  2/  sin  g cos  x 
cos  e — cos  /'  cos  g — sin  /'  sin  g cos  x 

Here,  i is  the  incident  angle,  between  the  local  normal 
and  the  direction  to  the  source,  e = 6,  is  the  emittance 
angle,  between  local  normal  and  the  direction  to  the 
viewer,  while  g = eo  is  the  phase  angle,  between  source 
and  viewer.  Eliminating  x from  the  two  equations  and 
expanding  the  sine  and  cosine  of  2/,  one  gets, 

cos  s *■  2 cos  /'  cos  e — cos  g 

Substituting  expressions  in  p and  q for  cos  i,  cos  e 
and  cos  g one  can  rewrite  this  as, 

U O+PoT+W)  / (l+^+f2)  - 1]  / V l+p02+*o2 

This  result  can  also  be  obtained  simply  by  finding  the 
direction  S'  from  which  a ray  must  come  to  be  specularly 
reflected  to  the  viewer  V,  by  a surface  element  with 
normal  N, 

S'  - 2 (V«N/  N - V 

where  V — (0.0.1).  The  off-specular  angle  is  the  angle 
between  S'  and  the  center  of  the  source,  S,  so 

cos  r - S-S'  = 2 (S-N)  (V-N)  - (S-V) 

Note  that  the  cosine  of  the  first  off-specular  angle 
can  be  calculated  easily,  without  using  trigonometric 
functions.  The  contours  of  constant  cos  s turn  out  to  be 
nested  circles  in  gradient  space,  with  centers  lying  on  the 
line  from  the  origin  to  the  point  (p0.q0).  This  can  be 
seen  by  noting  that  that  the  locus  of  the  point  S',  for 
constant  s,  is  a circle  about  the  point  S and  that  circles 
on  the  Gaussian  sphere  give  rise  to  circles  in  gradient 
space  when  projected  centrally. 

The  cosine  of  the  off-specular  angle,  s,  equals  one 
when  conditions  are  right  for  specular  reflection,  that  is, 
when  e — i and  g r + e.  This  can  be  seen  by  setting 
e — / — g/2  in  the  trigonometric  expression  for  cos  $. 


Bul-Tuong’s  Formal*  - Specular  Surface,  Extended 
Source  5 

Having  seen  how  to  calculate  the  off-specular  angle  t, 
we  can  now  make  a reflectance  map,  by  assigning  the 
distribution  of  source  brightness.  Us).  This  function 
should  be  non-negative,  monotonicaUy  decreasing  with  t. 


and  equal  to  one  when  s — 0.  For  ease  of  calculation 
one  choice  might  be 

Us)  = cos"(j/2)  = [Wi(l  + cos  s)]n/1 

where  n is  a number  that  defines  how  compact  the  bright 
patch  is  (A  useful  value  might  be  around  20).  So  far,  we 
have  developed  the  reflectance  map  for  a specular  surface 
and  a circularly  symmetric  source.  Many  surfaces,  such 
as  glazed  pottery  or  smooth  plastic,  both  glossy  and 
difusse  components  reflection.  Specular  reflection  takes 
place  at  the  smooth  interface  between  two  materials  of 
different  refractive  index,  while  the  matte  component 
results  from  scattering  of  light  that  penetrates  some 
distance  into  the  surface  layer. 

We  can  combine  these  two  components  as  follows 

K(P. q)  = [d— «)  -I-  aL{s)\  cos  i / cos(g/2) 

where  a determines  how  much  of  the  incident  light  is 
reflected  specularly.  The  expression  is  scaled  so  that  its 
maximum  is  (approximately)  equal  to  one.  Here  we  have 
assumed  the  source,  while  distributed,  is  compact  enough 
so  that  the  diffuse  reflection  component  can  be 
approximated  as  cos  i.  The  above  expression  obeys  the 
reciprocity  law  of  Helmholtz  (123]  which  applies  to  real 
surfaces  illuminated  by  a point  source.  Bui-Tuong  used  a 
reflectance  function  similar  to  the  one  derived  above  in 
his  computer  graphics  work  (112).  He  apparently  tried  to 
model  reflection  from  a surface  that  is  not  perfectly 
smooth.  This  requires  a different  off-specular  angle 
however,  as  will  be  seen  in  the  next  section. 


Luster  - The  Second  Off-Specular  Angle 

Refulgency,  gloss  or  shine  can  also  appear  when  a 
point  source  is  reflected  in  a surface  that  is  not  perfectly 
smooth.  When  a slightly  uneven  surface,  of  a material 
that  gives  rise  to  metallic  or  dielectric  reflection,  is 
illuminated  by  a point  source,  bright  patches  will  be  seen 
surrounding  points  where  the  local  tangent  plane  is 
oriented  correctly  for  specular  reflection.  The  size  of 
these  patches  will  depend  on  the  roughness  of  the  surface 
and  the  surface  curvature,  while  the  distribution  of 
brightness  will  depend  to  some  extent  on  the  texture  of 
the  microstructure  of  the  surface. 

In  this  case  we  will  need  to  calculate  the  second 
off-specular  angle,  s',  between  the  local  normal,  N,  and 
the  normal,  N',  oriented  for  specular  reflection  of  rays 
from  the  source  S towards  the  viewer  V (see  Fig.  21). 


facets,  too  small  to  be  optically  resolved,  each  turned  a 
little  from  the  average  local  surface  orientation.  One  can 
define  a distribution,  Pis'),  describing  what  fraction  of 
these  microscopic  facets  are  turned  away  from  the 
average  local  normal  by  an  angle  s>.  For  ease  of 
calculation  one  choice  might  be, 


Blinn’s  Formula  - Rough  Surface,  Point  Source  T 


One  can  use  the  fact  that  a normal,  N',  oriented  for 
specular  reflection  of  the  point  source  towards  the  viewer, 
lies  in  the  direction  (— pv— v,,l),  where 


Figure  21:  Spherical  triangles  used  to  calculate  the 
second  off-specular  angle,  s'.  It  is  the  angle 
between  the  actual  surface  normal,  N,  and  a 
surface  normal,  N',  oriented  to  specularly 
reflect  rays  from  the  source  towards  the 
viewer. 


Pt  - -co*  0O  tan  (0o/2) 
and  qx  — —sin  <t>Q  tan  (Oq/2) 

We  can  also  find  N'  by  normalizing  the  vector  (S  + V), 
so  that  its  third  component  equals  1. 


By  considering  the  appropriate  spherical  triangles  one 
finds. 


A surface  with  gradient  (p,,?,)  is  oriented  just  right  to 
specularly  reflect  a ray  from  the  source  to  the  viewer. 
This  can  be  seen  by  noting  that  when  p «.  p.  and  q » 


cos  / cos(g/2)  — sin  i sin(g/2)  cos  x 
»=  cos  f cos  g — sin  i sin  g cos  x 


Eliminating  x from  the  two  equations  and  expanding  the 
sine  and  cosine  of  the  phase  angle  g,  one  finds. 


cos  / = cos  e = 1 / Vl+P,2+?,2 
and,  cos  g — 2/(1  +p,2+?,2)  — 1 

In  any  case, 

cos  / - (1  +p,p+*,?)  / (Vl+^+V2  Vl  +P,2+*,2  ) 

Note  that  s'  will  tend  to  be  (roughly)  half  of  s 
when  both  angles  are  small.  Combining  matte 
components  of  surface  reflection  with  those  from  the 
rough  outer  surface  we  get. 


cos  / «■  (cos  / + cos  e)  / (2  cos(g/2)  ) 
cos  / «■  (cos  / + cos  e)  / (V3  VI  + cos  g ) 


This  result  can  also  be  obtained  by  finding  the 
vector  N',  normal  to  a surface  element  oriented  to 
specularly  reflect  a ray  from  the  source  in  the  direction 
of  the  viewer,  V.  That  is. 


The  off-specular  angle  is  the  angle  between  the  actual 
surface  normal  N,  and  this  vector  N'. 

The  above  reflectance  map  also  obeys  Helmholtz’s 

cos  t1  — N.N'  — |(S*N)  + (V-N)J  / Jl  VI  + (S-V)  reciprocity  law  and  is  normalized  so  that  its  maximum  is 

(approximately)  equal  to  one.  Blinn  and  Newell  give  a 
The  surface  microstructure  of  an  uneven  surface  similar  reflectance  function,  claiming  it  was  what  Bui- 

can  be  modeled  by  many  randomly  disposed  mirror-like  Tuong  had  proposed  [113].  The  two  are  not  the  same 


*(P.f)  ” Id—  a)  + aPis1)]  cos  / / cos(g/2) 
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however  since  the  two  off-specular  angles  are  different; 
in  fact,  the  contours  of  constant  s'  are  nested  ellipses  in 
gradient  space,  while,  as  mentioned  earlier,  the  contours 
of  constant  s are  nested  circles.  Indeed,  Bui-Tuong’s 
model  corresponds  to  reflection  of  an  extended, 
rotationally  symmetric  source  in  a specular  surface,  while 
the  model  presented  in  this  section  applies  to  reflection  of 
a point  source  in  a rough  surface. 

Blinn  and  Newell’s  Model  for  Specular  Surfaces 

One  of  the  methods  discribed  by  Blinn  and  Newell 
[113J  assumes  a perfectly  specular  surface  in  which  the 
world  surrounding  the  object  is  reflected.  To  make 
computations  feasible,  they  imagine  the  surrounding 
objects  at  a distance  great  enough  so  that  each  part  of 
the  surround  appears  to  lie  in  essentially  the  same 
direction  from  every  point  of  the  surface  of  the  object. 
In  this  case  one  can  imagine  the  brightness  distribution  of 
the  surrounding  objects  projected  onto  the  inside  of  a 
large  sphere.  The  gray  value  used  for  a particular 
surface  patch  then  is  found  by  computing  the  direction  S' 
from  which  a ray  must  coine  to  be  specularly  reflected  to 
the  viewer  V,  by  a patch  with  surface  normal  N.  We 
have  already  seen  that, 

S'  * 2 (V-N)  N - V 

The  appropriate  gray  value  is  then  determined  from  the 
spherical  distribution  of  brightness.  In  practice  the 
sphere  is  mapped  onto  a plane  by  calculating  the  zenith 
angle,  0o  and  azimuth,  <t>0  of  S'  (113).  The  brightness 
Hiefribution  can  be  equally  well  specified  in  gradient 
space  (137),  since  it  is  also  a projection  of  the  Gaussian 
sphere. 

Bouguer’s  Surface  Model 

Surface  models  incorporating  randomly  dispersed 
mirror-like  facets  were  first  studied  around  1760  by 
Bouguer  (117).  This  type  of  micro-structure  has  been 
investigated  extensively  since  then,  despite  the  difficulties 
of  reasoning  about  the  three-dimensional  nature  of 
reflection  from  such  surfaces.  Recently,  Torrance  and 
Sparrow  further  elaborated  on  these  models  (133,134)  in 
order  to  match  more  closely  experimental  data  showing 
maximum  brightness  for  angles  of  reflection  larger  than 
the  incident  angle.  They  included  in  their  considerations 
the  effects  of  obstruction  of  the  incident  and  emergent 
rays  by  facets  near  the  one  reflecting  the  ray.  Blinn 


simplified  and  explained  their  calculations  [114]  and  used 
them  in  producing  shaded  images  of  computer  models  of 
various  objects.  The  overall  result  can  be  broken  into  a 
product  of  three  terms,  one  dependent  on  the  distribution 
of  facet  orientations,  the  second  being  the  formula  for 
Fresnel  reflection  from  a flat  dielectric  surface,  while  the 
third  is  the  geometric  attenuation  factor  accounting  for 
partial  occlusion  of  one  facet  by  another.  We  will  not 
discuss  these  models  in  any  more  detail  here. 

Models  for  glossy  or  lustrous  reflection  have  been 
used  with  great  success  in  computer  graphics  to  increase 
the  impression  of  realism  the  viewer  has  when  confronted 
with  a synthetic  picture  of  objects  represented  in  the 
computer.  Unfortunately,  these  methods  do  not  seem  to 
improve  the  presentation  of  surface  shape  for 
cartographic  purposes. 


Colored  Shading 

It  is  often  said  that  quantitative  information  about 
the  surface  cannot  be  obtained  from  relief  shading  (1). 
Contour  lines  on  the  other  hand  do  allow  measurements 
of  elevation  and  estimation  of  the  gradient.  Shading  does 
provide  some  information  about  the  gradient  too,  but 
cannot  be  used  to  determine  both  of  its  components 
locally,  since  only  one  measurement  is  available  at  each 
point.  Since  we  can  perceive  the  shape  of  objects 
portrayed  by  shaded  pictures,  it  seems  that  these  local 
constraints  do  lead  to  a global  appreciation  of  shape, 
apparently  based  on  our  assumption  that  the  surface  is 
continuous  and  smooth. 

If  two  shaded  images,  produced  with  the  assumed 
light  source  in  different  positions,  were  available  however, 
two  measurements  could  be  made  at  each  point  allowing 
one  to  determine  the  gradient  locally  (140).  It  is 
inconvenient  to  work  with  two  shaded  overlays; 
fortunately  though,  they  can  be  combined  by  printing 
them  in  different  colors.  In  fact,  yet  another  overlay  can 
be  added  in  a third  color,  but  it  adds  no  new 
information,  since  the  two  components  of  the  gradient 
are  already  fully  determined  by  the  first  two. 

Colored  shading  corresponds  to  illumination  by 
multiple  sources,  each  of  a different  color.  The  exact 
color  at  each  point  in  the  printed  result  is  uniquely 
related  to  the  gradient  at  that  point.  Thus  quantitative 
information  is  available  in  this  new  kind  of  map  overlay. 
Further,  ambiguities  present  in  black  and  white 
presentations  disappear.  By  positioning  the  light  sources 
properly,  one  can  avoid  problems  occasioned  by  the 
accidental  alignment  of  ridge  or  stream  lines  with  the 
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direction  of  incident  light.  Thus  the  need  for  ad  hoc 
adjustments  of  the  azimuth  of  the  assumed  light  source  is 
removed. 

Colored  shading  is  easy  to  interpret  in  terms  of 
surface  shape  and  effective  in  portraying  surface  form.  It 
is  unlikely  however  that  it  will  be  widely  used  because  of 
the  added  expense  of  printing  and  conflict  with  existing 
uses  of  color  in  cartography  to  distinguish  various  kinds 
of  planiinetric  information.  Amongst  other  things,  color 
is  now  used  to  code  height  and  surface  cover.  Further, 
yellow  is  used  in  ordinary  shading  for  sun-facing  slopes, 
while  violet  is  used  for  shaded  regions  [148],  This  is 
thought  to  simulate  the  increased  sky  illumination 
component  in  areas  turned  away  from  the  sun. 

Summary  and  Conclusions 

After  a brief  review  of  the  history  of  hill-shading  an 
efficient  method  for  providing  shaded  overlays  was 
described.  It  depends  on  a lookup  table  containing 
sampled  values  of  the  reflectance  map.  Traditional, 
manual  methods  were  explored  in  terms  of  their 
equivalent  reflectance  maps,  as  were  phenoinenonlogical 
models  used  in  the  computer  graphics  community. 
Methods  that  have  been  proposed  for  mechanizing  the 
generation  of  relief  shading  were  also  treated.  The 
automated  method  described  here  is  very  flexible,  since  it 
can  use  any  reflectance  map. 

Some  reflectance  maps  appear  much  better  than 
others  in  conveying  an  immediate  impression  of  surface 
shape.  Rotationally  symmetric  reflectance  maps, 
corresponding  to  overhead  illumination  of  the  terrain,  are 
not  very  good  for  example.  Perfectly  diffuse  reflectance 
is  not  optimal  either.  In  fact,  various  approximations  to 
the  formula  for  a Lambertian  reflector  seem  to  produce 
better  results.  Simple  inonotonic  functions  of  the  slope  in 
the  direction  away  from  the  light  source  appear  to  be 
best.  Glossy  reflectance  components,  while  very  useful  in 
the  portrayal  of  regular  objects,  do  not  seem  to  be 
helpful  in  the  case  of  complicated,  irregular  surfaces. 
Shading  is  an  important  depth  cue.  The  choice  of 
reflectance  map  should  not  be  based  on  some  ad  hoc 
model  of  surface  behavior,  experimental  measurement  of 
reflectance  of  some  material,  or  formulas  that  happen  to 
be  easy  to  calculate.  Instead,  one  should  use  a 
reflectance  map  that  gives  rise  to  an  immediate,  accurate 
perception  of  surface  shape. 

It  is  important  to  arrange  for  the  range  of  gray 
tones  in  the  shaded  overlays  to  be  limited  so  as  to  avoid 
obscuring  planiinetric  detail  [149],  This  is  an  area  that 


has  not  received  much  attention  so  far.  Another 
important  issue  relates  to  the  appropriate  scale  for  shaded 
overlays.  Shaded  overlays  are  useful  for  large  scale  maps. 
For  small  scale  maps  it  is  necessary  to  generalize  the 
surface  to  avoid  the  appearance  of  complex  textures  that 
inav  be  difficult  to  interpret  [1,49,76,77,150].  This 
nonlinear  process  of  removing  small  hills,  ridges  and 
valleys  has  not  yet  been  satisfactorily  automated. 

An  as  yet  unexplored  possibility  depends  on  finely 
sampled  terrain  elevations.  This  is  the  ability  of  shading 
to  show  fine  detail.  Contour  maps  have  to  be  carefully 
generalized  or  smoothed  to  avoid  showing  confusing 
detail  on  a scale  smaller  than  the  contour  interval.  This 
is  not  the  case  with  shading,  although  historically  the 
manually  produced  maps  have  always  shown  only  quite 
coarse  features.  We  do  not  yet  know  whether  the 
textures  produced  by  the  shading  method  when  working 
from  really  fine  terrain  models  will  be  confusing,  or  of 
great  value  in  identifying  different  types  of  terrain. 
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Reflectance  maps  in  the  order  In  which  they  we<-e  introduced  in  this  paper 
The  first  ten  appear  on  the  left,  the  last  ten  on  the  right.  The  letter 
codes  correspond  to  the  letters  after  the  section  headings  of  the  corres- 
ponding sections. 


Figure  22 
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Figure  23:  Relief  shading  produced  using  the  reflectance  maps  shown  in 

the  previous  figure.  The  order  is  by  rows  from  top  to  bottom. 
Within  each  row  the  order  is  from  left  to  right.  The  digital 
terrain  model  used  has  200  columns  of  240  rows. 
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ABSTRACT 

There  is  increasing  inheres*  in  map 
features  such  as  points,  lines  and 
regions  both  as  a pictoral  data  base  for 
resource  management  and  as  an  aid  to 
identifying  objects  in  aerial  images. 
Owing  to  the  very  large  amount  of  da*a 
involved,  and  the  need  to  perform 
opera* ions  on  this  data  efficiently,  * he 
representation  of  such  features  is  a 
crucial  issue.  We  describe  a 
hierarchical  represents* ion  of  map 
features  that  consis*s  of  binary  trees 
with  a special  datum  at  each  node.  This 
datum  is  called  a strip  and  *he  tree 
that  contains  such  data  is  called  a 
strip  tree.  Lower  levels  in  the  tree 
corresponds  to  finer  resolution 
representations  of  the  map  feature.  The 
strip  tree  struc*ure  is  a direct 
consequence  of  using  the  method  for 
digitizing  lines  given  by  [Duda  4 Hart, 
1973;  Turner,  1974;  Douglas fPeucker , 
1973)  and  re*aining  all  intermediate 
steps.  This  representation  has  several 
desirable  properties.  For  features 
which  are  well-behaved,  calculations 
such  as  point -membership  and 
intersection  can  be  resolved  in  0(log  n) 
where  n is  the  number  of  feature  points. 
The  map  features  can  be  efficiently 
encoded  and  displayed  at  various 
resolutions.  The  representation  is 
closed  under  intersection  and  union  and 
these  operations  can  be  carried  out  at 
different  resolutions.  All  these 
properties  depend  on  the  hierarchical 
tree  structure  which  allows  primitive 
operations  to  be  performed  at  the  lowest 
possible  resolution  with  great 
computational  savings.  The  strip  tree 
representation  also  can  allow  parts  of 
the  map  feature  to  be  accessed 
sequentially.  This  feature  is  usually 
desired  when  the  map  feature  is  used  in 
analyzing  Images. 

The  price  paid  for  the  Improved 
performance  is  an  increased  storage 
cost.  This  is  approximately  4n,  where  n 
is  the  storage  needed  *o  represent  the 
xy  coordinates. 


1.  Introduction 

We  present  a general  representation 
for  polylines  (connected  line  segments) 
and  areas  (closed  polylines).  Although 
this  representation  may  have  wide 
applications,  its  principal  motivation 
arose  from  the  problem  of  :epresenting 
geographical  data  bases  of  map  features. 

A map  has  several  interesting  kinds 
of  features  such  as  contour  lines, 
lakes,  rivers,  roads,  etc.  These  can  be 
roughly  divided  into  four  feature 
classes  for  representation  in  the 
computer  (Sloan,  1978): 


feat  ur  e 

examples  in  nap  dona  it. 

points 

tovns  (large  scale  maps) 

bridges  (snail  scale 

maps) 

lines 

roads,  coastlines 

strips 

vide  roads,  rivers 

regions 

lakes,  counties 

Our  main  interest  is  in  representing 
lines  and  regions.  A point  is  such  a 
simple  datum  that  it  can  be  easily 
treated  as  a primitive  in  any 
representation.  Collections  of  points 
from  a single  class  can  be  efficiently 
represented  as  k-d  trees  [Bently,  1975; 
Barrow  et.al.,  1977)  and  so  points  are 
not  the  focus  of  our  interest,  although 
they  do  interact  with  our 
representation.  A strip  feature  is 
essentially  a line  where  a locally 
varying  thickness  is  important,  examples 
of  which  are  rivers  and  roads.  As  we 
shall  see,  our  representation  for  lines 
will  also  encompass  this  type  of 
feature. 
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We  regard  collections  of  these  map 
features  as  a data  base  that  might  be 
used  to  perform  the  following  tasks: 

.Find  where  a road  intersects  a 
river 

.Display  a subset  of  map  features 
that  appear  in  a given  map  sector 

•Find  out  if  a given  point  is  in  a 
region 

.Search  an  aerial  image  near  the 
edge  of  a dock  for  ships. 

A very  important  aspect  of  all  these 
tasks  is  that  we  may  be  satisfied  if 
they  are  performed  at  resolution  lower 
than  the  ultimate  resolution 
represented. 

Our  representation  for  lines  and 
regions  consists  of  a binary  tree 
structure  where,  in  general,  lower 
levels  in  the  tree  correspond  to  finer 
resolutions.  The  tree  structure  is  a 
direct  consequence  of  using  the  method 
for  digitizing  lines  given  by  (Duda  and 
Hart,  1973;  Turner,  19741  and  retaining 
all  intermediate  steps  in  the 
digitization  process.  As  an  example  of 
the  representation,  Figure  1 shows  some 
roads  represented  at  various  levels 
(resolutions)  in  the  tree  structure. 

The  idea  of  representing  a line  by 
sets  of  strips  was  recognized  by 
(Peucker,  1976).  In  particular  he  was 
able  to  find  line  intersection  and  point 
in  polygon  algorithms.  However,  the 
tree  structure  is  a vas*  improvement 
over  the  set  ( organization:  the 
algorithms  are  mrire  efficient,  line-area 
intersection  and  area-area  intersection 
and  union  can  now  be  dealth  with,  and 
the  tree  structures  are  closed  under 
these  operations. 


Figure  1.  Hap  features  displayed  at 
various  resolutions  using  the 
hierarchical  structure. 


2.  The  Strip  Tree 
2.1  Notation 

We  define  a strip  segment  L (delta) 
as  the  vector  L and  the  scalar  Jelta  as 
shown  by  Figure  2.  The  vector  L starts 
at  (XBeg.YBeg)  and  ends  at  (XEnd.YEnd). 
We  use  S to  denote  the  set  of  points 
inscribed  by  the  rectangle  defined  by 
L(delta) . Also  we  denote  the  boundaries 
of  the  rectangle  by  the  line  segments 
!♦,  1-,  e+,  e-  as  shown. 


Figure  2.  Definition  of  a Strip 
Segment . 


A polyline  is  an  ordered  list  of 
discrete  points  y0,...,yn  subsets  of 
which  may  be  col inear.  For  the  moment 
we  require  these  points  to  be  considered 
as  connected;  later  we  will  relax  this 
condition.  We  say  a polyline  is 
represented  at  resolution  delta*  if 
the-e  ex  ‘ sts  an  ordered  sequence  of  m 
stri”  segments 

L* (delta),  k=0,  ...»  m-1 

such  that 

delta  <delta*  k»0,  ...»  m 

m 

yie  IJ  kt,  i_1»  •••'» 

K«e 

If  within  a strip  segment  there  is  a 
point  y that  is  a member  of  e+,  another 
that  is  a member  of  e-,  and  there  is  a 
point  y that  is  a member  of  1+  and 
another  that  is  a member  of  1-,  then  the 
strip  segment  is  said  to  be  compact . 
The  compactness  property  is  very 
important  for  some  of  the  algorithms 
which  follow.  Figure  1 shows  some 
examples  for  different  deltas. 


2.2.  Digitization 

Suppose  we  have  a polyline  P.  such 
as  shown  by  Figure  3a.  For  any 
resolution  delta  we  can  approximate  this 
line  with  strip  segments  as  follows 
(Duda  & Hart,  1973;  Turner,  1974): 


Consider  the  polyline  P 
defined  by  (y0,...,yn).  For 
each  point  y P find  the 
perpendicular  distance  d(y) 
from  y to  P.  Denote  the 
subset  of  y P such  that  y.L>0 
as  P+.  P-«  P-P+.  Now  find  d+ 


“ (MU!  d(y)  and  d-  - majt  d(y). 
If  W)  + (d-)  <delta*  then 
the  polyline  is  compactly 
represented  at  resolution  by 
the  strip  tree  consisting  of  a 
single  root  strip 
1 ( (d+) + (d— ) ) . If  not  then  the 
desired  strip  tree  is  obtained 
by  recursively  applying  the 
algorithm  to  the  Ps  y0,...,y+ 
and  (y+) +1, . . . ,yn  and  making 
the  results  the  left  son  and 


right  son  respectively  of  the 


strip  tree.  In  the  case  of 
ties  for  the  maximum  distance 
d,  we  will  arbitrarily  pick 
the  point  nearest  the  mid 
point  (in  arc  length). 


For  the  purposes  of  the  union  and 
intersection  algorithms  to  follow  it  is 
helpful  to  think  of  the  strip  trees  as 
completely  expanded  down  to  individual 
points,  even  though  these  points  may  be 
colinear.  Figure  3 shows  an  example  of 
two  levels  of  recursion  of  this 
algorithm. 

Figure  3.  Steps  in  the  Digi‘ization 
Process. 

To  see  formally  that  the 
convergence  is  guaranteed,  note  that  a P 
of  k points  can  always  be  approximated 
by  a single  strip  segment  L(k)  with 
length  k assuming  eight-connectedness. 
Thus  for  any  delta  there  must  be  a strip 
tree  with  leaves  consisting  of  no  more 
than  2n/delta  strip  segments  which 
approximate  P.  Since  the  digitization 
algorithm  splits  each  P into  two  parts 
such  that  each  part  has  finite  leng*h, 
the  process  must  ultimately  consider 
sets  of  P of  delta  points  or  less. 


2.3  Strip  Tree  definitions 

The  binary  tree  resulting  from  the 
digitization  process  is  called  a strip 
tree,  where  the  datum  at  each  node  is  a 
strip,  L.  The  nodes  of  the  tree  are 
initially  ordered  on  arc  length.  (Later 
we  will  see  that  when  intersection 
occurs  in  two  areas  which  are 
represented  in  strip  trees,  this 
property  is  sometimes  not  preserved) . 

In  the  ensuing  algorithms  we  will 
use  the  following  definitions: 

T ■ symbol  for  a Strip  Tree  obtained  by 
“ the  digitization  process. 

S (T)  ■ the  points  associated  with  the 
strip  at  the  root  node  of  T; 
i.e.  fxlx  e S(T) ) 

Area(T)  ■ the  area  associated  with  the 
strip  at  the  root  node  of  T.  We 
measure  area  in  pixels  so  tha*  a 
strip  L(0)  still  has  finite  area. 
The  most  primitive  strip,  a single 
point  has  unit  area. 

LSon(T)  * the  left  son  of  the  node  T 
RSon(T)  ■ *he  right  son  of  the  node  T 
A node  of  the  s*rip  • ree  is  completely 
defined  by  the  seven-tuple  (LSon,  RSon, 
Area,  XBeg,  XEnd,  YBeg,  YEnd) . The 
measure  Area(T)  is  better  for  some  of 
the  algorithms  to  follow.  Area  and 
delta  are  related  by  delta  »Area/| |L| | . 


2.4.  Why  Binary  Trees? 

The  polylines  can  also  be 
represented  as  a tree  with  nodes  of  more 
than  two  siblings.  In  fact, nodes  could 
have  different  numbers  of  siblings  which 
would  still  be  ordered.  Figure  4 shows 
an  example  of  the  alternate  encoding 
scheme.  In  certain  cases  this  may  be  a 
more  concise  represents* ion  for  the 
polyline  and  for  all  the  algorithms  th* 
follow  we  can  extend  the  operations  from 
two  sons  to  multiple  sons.  However, 
this  change  does  not  alter  the 
complexity  of  the  operations  that  we 
would  like  to  perform  and  can  be  more 
inefficient  than  the  binary  tree 
representation. 

Figure  4.  A portion  of  an  encoding 
using  m-ary  trees. 


3.  Operations  on  Polylines 

Computational  complexity  of  the 
various  operations  is  difficult  to 
characterize,  as  if  depends  on  the 
particular  geometry  of  polylines.  If 
the  polylines  are  "well-behaved",  that 
is  they  are  relatively  smooth  and  do  not 
self-intersect  for  more  than  a few 
points,  then  the  algorithms  are  very 
efficient.  What  this  means  for  a 
particular  operation  in  terms  of  the 
strip  tree  is  that  if  the  number  of 
strips  that  must  be  examined  at  any 
level  is  constant,  then  the  complexity 
of  the  operation  is  0(log  n) . 


3.1.  Testing  the  Proximity  of  a Point 

If  we  would  like  to  find  out  if  a 
point  is  near  a polyline,  this  may  be 
discovered  early  using  the  strip  tree. 
We  can  make  this  more  precise  by 
exploiting  the  following  property: 

Property  PI: 

A.  If  a point  z is  inside  a 
compact  strip  1 (delta)  then  it 
can  be  at  most  24tk* units  away 
from  the  P. 

B.  If  a point  z is  outside  a 
compact  strip  1 (delta)  then 
the  distance  of  the  point  from 
the  P is  bounded  by 

0 < z < d^z,  1 (delta))  ♦ 2*d«lta 


j 
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It  is  interesting  to  study  these  bounds 
as  the  depth  in  the  resolu* ion  tree 
increases.  Although  • he  convergence  is 
no*  mono*onic,  * he  bounds  do  converge  to 
’he  ac*ual  se* -♦ heoret ic  distance 
ds(z,P) . Now  suppose  we  wan*  *o  answer 
*he  question;  is  dj(z,P)<d0?  If  this 
can  be  answered  af f irma* ively  we  will 
find  thife  out  at  the  poin*  where  any 

upper  bound  is  less  than  d . If  the 
answer  is  no.  then  this  will  be 

discovered  when  * he  * ree  has  been 
explored  to  *he  poin*  where  all  minimum 
bounds  are  greater  *han  de.  Similar 
argumen’s  can  be  made  for  the 

qualitative  level-of-ef for*  required  to 
answer:  is  d,(z,P)>dp?  From  this 

discussion  we  can  see  *hat  the  search 
will  be  inefficient  only  if 

ds(z,S(T))  and  a large  number  of  the 
strips  are  nearly  d»  from  z.  Figure  5a 
shows  this  case  together  with  a more 
representative  example. 


Figure  5.  Two  of  many  Possible 
Geometries  When  Tes*ingthe  Distance  of  a 
Point  from  an  P. 


To  summarize  *his  discussion,  we  provide 
the  algorithms  to  tes*  for  ds(z,P)  < dD 
and  dj(z,P)  > de.  These  algorithms  use 
the  notion  of  the  dis'ance  of  a point  to 
a set  which  is  defined  as  follows.  For 
any  strip  S,  if  a point,  is  ou*side  S 
i.e.  x4s  then  its  distance  to  S is 
charac* er ized  by  the  set  theoretic 
distance  ds(z,S)  * rp^n  d(x,z)  where  d is 
the  euclidean  distance  between  the 
points  x and  z.  For  clarity,  the 
algorithms  are  presented  as  procedures 
in  a pseudo-Algol  language.  Rigor  has 
been  sacrificed  mainly  in  the 
specification  of  data  types,  but  these 
should  be  obvious  from  the  earlier 
definitions. 


Algorithm  Al:  Is  a point  within  d0  of  a 
polyline? 

boolean  procedure  Within  (z,d0,T) 
begin 

if  d0<  ds(z,S(TM  + 2. delta (T) 
then  return  (’rue) ; 
if  z*S(T)  and  d0 >ds (z , S (T) ) 
then  return  (false); 
return  (Within  (z,d0,LSon(T) ) 
or  Within  ( z ,d0 , RSon (T) ) ) ; 
end ; 


Algorithm  A2;  Is  a point  further 
than  d0  from  a polyline? 
boolean  procedure  Further 
(z,d0,T) 
begin 

if  d0<  ds ( z , S (T) ) + 2. delta (T) 

then  return  (false); 

if  z 0 S (T)  and  d0>ds (z,S (T) ) 

then  return  (true) ; 

return  (Further  (z ,d0 , LSon (T) ) 

and  Further  (z,d0,RSon(T) ) ) ; 

end ; 

3.2  Displaying  a Polyline  at  Different 
Resolutions 

As  previously  demonstrated  in 
Sec*  ion  2,  a polyline  may  be  represented 
as  a set  of  strip  segments  such  that 
each  strip  segment  L has  a resolution 
delta  less  than  some  fixed  delta0.  The 
algorithm  to  display  such  a 

representation  using  the  strip  tree  is 
as  follows.  This  algorithm  uses  a 
device-dependent  subroutine 

Display Rectangle  which  paints  the 
rectangle  on  the  particular  display 
device . 

Algorithm  A3:  Display  a polyline  a* 
Resolution  delta0 

procedure  PolyDisplay  (T,delta0) 
begin 

if  delta(T)  < delta0  then 
DisplayRect  angle 
(L (T) , del*  a (T) ) 

else  (PolyDisplay 
(LSon (T) ,delta0)  and 
PolyDisplay  (RSon (T) ,delta0) ) ; 
end ; 


3.3  Intersec* ing  Two  Polylines 

One  of  the  important  features  of 
the  representation  is  the  ability  to 
compute  intersections  between  polylines. 
Strip  trees  provide  the  facility  to  not 
only  compute  intersection  points,  but, 
in  the  case  where  lower  resolution  is 
satisfactory,  to  compute  small  areas 
containing  the  intersection  points  at 
great  computational  savings.  In  order 
to  develop  the  intersection  methodology, 
we  need  the  following  definitions: 

A.  Two  strip  segments  (LI 
derived  from  PI)  and  (L2 
derived  from  P2)  do  not 
intersect  iff  LI  O Ll«  0 


B.  Two  strip  segments  LI,  L2 
have  a clear  intersection 
iffll+  and  Tl-  intersect 
12+  and  12-. 


c 


. Two  s'rip  segments  LI  and 
L2  have  a possible 
intersection  if  condition 
B is  not  satisfied  yet 
LJ  O LZ*  0. 

These  cases  are  illustrated  by  Figure  6. 
A fairly  obvious  but  very  important 
lemma  is: 

Clear  Intersection  Lemma. 
[Peucker,  1976)  If  two  strip 
segments  have  a clear 
intersection  and  the  strips 
are  both  compact,  then  the 
corresponding  Ps  mus*  also 
intersect . 

To  see  this  for  condition  B,  consult 
Figure  6b.  PI  divides  the  region  R in'o 
two  parts  and  P2  must  cross  from  one  to 
the  other.  The  only  way  the  P2  can  do 
this  is  by  intersecting  PI. 


Figure  6: 
Intersect 
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to 
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polyl ines 
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recursive 

, and  assume  ' 

he 

existence 

of 

an  integer  procedure  Striplntersect ion 
which  will  return  the  type  of 

intersection  and,  in  the  case  of  a clear 
type,  will  return  a parallelogram  Q 
containing  'he  intersection  points. 

Algorithm  A4:  Finding  out  whether 
two  polylines  intersect 

Comment.  If  the  two  root 
strip  segments  do  not 

intersect  then  the  Ps  do  not 
intersect.  If  the  root 

segments  have  a clear 
intersection  then  the  Ps 
intersect.  Since  'he  task  is 

to  just  determine  whether  or 
not  an  intersection  exists,  we 
are  done  the  moment  we  find  a 
clear  intersection. 

boolean  procedure  Intersection 
(T1,T2,  Primitive  Flag) 
comment  Primitive  Flag  allows 
the  use  of  a single 

strip  as  the  firs'  argument 
begin 

Case  Striplntersection 
(S(T1) ,S(T2) ,Q)  into 

(Null)  return  (false), 
(Possible)  if 
(Area(Tl) >Area(T2) ) or 
(Primitive  Flag)  then 
return 

( (Intersection (LSon (Tl) ,T2)  or 
(Intersection (RSon (Tl) ,T2) ) ; 


else  return 

(Intersect  ion (Tl , LSon (T2) ) 
or  In* ersec' ion (Tl , RSon (T2) ) ) ; 

(Clear)  return (' rue) ; 
end ; 

This  procedure  is  easily  modified  ‘o 
return  a set  of  parallelograms 
comprising  intersection  points.  Further 
easy  modifications  can  be  made  to 
constrain  these  parallelograms  to  be  of 
a certain  size  rela'ed  to  * he  delta(Tl) 
and  del'a(T2);  i.e.,  * hey  can  be  made 
to  be  as  small  as  we  want. 

Note,  however,  that  smaller 
resolutions  may  be  much  more 
computationally  expensive,  as  shown  in 
the  following  example  (Figure  7)  where 
intersection  at  the  coarsest  resolution 
is  simple,  but  multiple  in' ersections 
occur  at  lower  levels. 

Figure  7:  An  intersection  may  be  simple 
at  one  level  and  complicated  a*  lower 
levels . 

If  the  two  Ps  are  not  convoluted 
abou*  each  o'her  the  intersection  will 
be  computed  in  0(mlog(n))  steps  where  m 
is  the  number  of  jntersec* ion  points. 
If  the  Ps  do  not  intersect  but  have  a 
closes'  distance  d =ds(Pl,P2)  then  this 
will  be  discovered  at  a level  in  the 
tree  no  deeper  than  a point  where 
4/i  > delta  1*  cStit»2. 

The  worst  case  performance  is 
intolerable  as  the  algorithm’s 
computation  will  grow  exponen* ial ly  as 
long  as  all  the  strip  segments  in  one 
tree  intersect  all  the  s'rip  segments  in 
the  other.  In  fact,  the  compu'ation  can 
be  shown  to  be  0(2*)  where  K is  the 
sum  of  the  dep* hs  in  each  tree  where  the 
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3.4  The  Union  of  Two  Polylines 

The  union  of  two  strip  trees  can  be 
accomplished  by  defining  a strip  that 
covers  both  of  the  two  root  strips. 


Algorithm  A5:  P-P  Union. 

For  two  Ps  defined  by  . 
fy.'  • • -y4>  (y*  . ..y»jtreat 
these  as  two  subse' s and 
conca'enate  the  subsets. 
That  is,  '.he  resultant 
ordering  is  such  tha*  we 
have  y,  -y,'  , y^,  *=y«»" . 


I 


Now  define  a strip 
segment  that  covers  $ y„  , 
• • •ymm.ilsuch  tha-  c-fland 
delta  1 «d*.  . By 

construction,  this 

satisfies  all  the 
properties  of  a strip 
segment.  Make  this  the 
root  node  of  a new 
P-tree.  The  two  subtrees 
are  the  two  Ps  of  the 
union. 

This  construction  is  shown  in  Figure  8. 

The  variable  c is  defined  below. 


Figure  8:  Construction  for  Onion  of 

Strip  Trees  Representing  Two  Polylines 


Of  course  this  construction  introduces  a 
problem  in  that  the  new  strip  is  no 
longer  compact  and  therefore  the  Clear 
Intersection  Lemma  no  longer  holds.  To 
overcome  this  problem  we  must  add  one 
•At  of  information  to  each  node  to  mark 
whether  the  underlying  polyline  is 
compact.  Since  later  algorithms  may 
result  in  underlying  polylines  that  are 
disconnected,  we  include  this  in  the 
following  definition  of  C: 

C(T)  « 1 P represented  by  S(T) 
is  known  to  be  compact  and  connected 

• 0 otherwise 

With  this  strategy  we  can  preserve  the 
eloquence  of  the  previous  algorithms  in 
the  following  manner:  When  bit  C(T)  is 
not  one  we  apply  the  recursion 
regardless  of  the  intersection  type.  In 
algorithm  A4  this  means  that  clear 
intersections  are  reported  as  possible 
if  the  bit  C(T)  is  set. 

This  technique  can  also  be  used  as 
a digitization  method  for  m 
non-connected  segments 

These  segments  are  given  an  ordering  as 
shown.  The  previous  digitization 
algorithm  is  applied  to  this  set  of 
points,  and  the  perpendicular  distance 
d*  is  computed  from  the  set  of 
disconnected  ys  and  used  to  define  the 
*Mtof  the  root  strip  as  before.  However 
now  the  set  is  divided  into  two  subsets 
of  connected  segments  (rather  than  using 
y*)  and  the  digitization  algorithm  is 
applied  recursively  to  the  subsets. 
Once  this  process  produces  connected 
subsets,  the  earlier  digitization  scheme 
is  applied. 


4.  Areas  Represented  by  Strip  Trees 

We  take  the  boundary  of  an  area  to 
be  a closed  polyline.  Interestingly 
enough,  the  digitization  method 
described  in  Section  2 works  for  closed 
polylines  and,  incidentally,  also  for 
self-intersecting  polylines. 

Furthermore,  if  an  area  is  not  simply 
connected  it  can  still  be  represented  as 
a strip  tree,  which  at  some  level  has 
connected  primitives.  The  method  for 
doing  so  was  described  in  the  previous 
section.  If  a region  has  holes  it  can 
•be  represented  by  a single  boundary 
curve  using  a construction  (Figure  9). 

Figure  9:  A Region  with  a Hole 
If  the  holes  are  important,  they 
themselves  should  be  independently 
represented  as  strip  trees. 

The  most  remarkable  fact  is  that  by 
representing  an  area  in  this  way  many 
useful  operations  such  as  intersection 
between  a polyline  and  an  area, 

determining  whether  a point  is  inside  an 
area,  and  intersecting  two  areas  are 
carried  out  very  efficiently. 

4.1  Determining  Whether  a Point  is 
Inside  an  Area 

The  strip  tree  representation  of  an 
area  by  its  boundary  allows  the 
determination  of  whether  a point  is 
inside  the  area  in  a straightforward 
manner.  If  any  serai-infinite  line 

terminating  at  the  point  intersects  the 
boundary  of  the  area  an  odd  number  of 
times,  the  point  is  inside.  This  result 
appears  in  (Minsky  and  Papert,  1969], 
This  result  is  computationally 
simplified  for  strip  trees  in  the 
following  manner: 

Point  Membership  Property 

To  decide  whether  a point 
z is  member  of  an  area 
represented  by  a strip 

tree,  we  need  only 
compute  the  number  of 
clear  intersections  of 
the  strip  tree  with  any 
semi-infinite  strip  L 
which  has  delta  ■ 0 and 

emanates  from  x.  If  this 
number  is  odd  then  the 
point  is  inside  the  area. 


VT  "r'  ~TtT- 
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An  extension  to  the  clear 
intersection  lemma  which  makes  this 
property  hold  is  that  the  underlying 
curves  may  intersect  more  than  once  but 
must  intersect  an  odd  number  of  times. 
The  following  algorithm  is  used  to 
determine  whether  a point  is  inside  an 


Algorithm  A6:  Point  Membership 

boolean  procedure  Inside (z,T) 
begin 

CreateStrip(S0f z) 
comment  CreateStrip  creates  a 
strip  for  the  half  line, 
if 

NoOfClearlnt  ersect ions (S0,T) 
is  odd  then  return  (true) 
else  return  (false)  ; 
end ; 

integer  procedure 

NoOf Cl ear  Inter  sect  ions (S,T) 
begin 

CaseStri pint ersect ion (S,S (T) ) 
into 

[Null]  return  (0)  ; 

(Possible)  return 
(NoOfClear Inter  sect  ions (S, LSon ( 
T)) 

+ 

NoOfClearln*  ersec* ions 
(S.RSon(T))); 

(Clear)  return  (l)s 
end  s 

A poien’ial  difficulty  exists  wi*h 
the  procedure  NoOfClearlntersect ions 
when  the  strip  SO  is  tangent  to  the 
polyline.  Since  this  problem  will  only 
occur  at  the  lowest  level  of  the  tree, 
we  can  examine  neighboring  leaves  of  the 
tree  to  resolve  it. 


4.2  Intersecting  a polyline  with  an 
Area 

The  strategy  behind  intersecting  a 
strip  tree  representing  a polyline  with 
a strip  tree  representing  an  area  is  to 
create  a new  tree  for  the  portion  of  the 

polyline  which  overlaps  the  area.  This 
can  be  done  by  trimming  the  original 
polyline  strip  tree.  This  is  done 
efficiently  by  taking  advantage  of  an 
obvious  property  of  the  intersection 
process : 


Pruning  Property: 
Consider  'wo  strips  Sp  e 
Tp  and  Sa  e Ta.  If  the 
SpClTa  is  null,  then  (a) 
if  any  point  on  Sp  is 
inside  Ta  the  entire  tree 
whose  root  strip  is  Sp  is 
inside  or  on  Ta  and  (b) 
if  any  point  on  Sp  is 
outside  of  Ta  then  the 
entire  tree  whose  root, 
strip  is  Sp  is  outside  of 
Ta. 

This  leads  t.o  the  recursive 
procedure  A7  for  polyline-area 
intersection  using  trees.  Note  that 
since  strip  nodes  under  a clear  or 
possible  strip  intersection  may  be 
pruned,  the  bit  c for  the  latter  strip 
is  set  to  0 to  denote  that  it  no  longer 
has  the  compactness  property.  Of  course 
as  repeated  intersections  are  carried 
out  with  different  areas  more  and  more 
upper-level  strips  may  have  their  bits 
set  to  0i  nevertheless,  the  intersected 
polyline  is  accurately  represented  at 
the  leaves  of  the  strip  tree. 

Note  that  if  the  polyline  strip  is 
"fatter,"  i.e.,  Area(Tl)  > Area(T2),  we 
can  copy  the  node  and  resolve  the 
intersec* ion  at  lower  levels,  whereas  in 
the  converse  case  we  have  to 
sequentially  prune  the  tree  by  first 
intersecting  the  polyline  strip  wit.),  the 
left  area  strip  and  then  intersecting 
the  resultant  pruned  *ree  with  the  right 
area  strip. 


Algorithm  A7:  Polyline-Area 

Intersect  ion 

reference  procedure 

PolyArealnt (T1.T2) 
begin 
A:-T2 

comment  A is  a global  used  by 
PAInt j 

ret  urn (PAInt (T1 ,T2) ; 
end j 


reference  procedure  PAInt (T1.T2) 
begin 

Case  Striplnt  (T1,T2)  into 
(Null  or  Primitive) 

if  Intersection  (Tl,A, 
TRUE)  ■ null  then 

if  Inside (Tl, A)  then 
return  (Tl) 

else  return  (null) j 
else  return  (Tl) ; 
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(Clear  or  PossibleJ  if 
Area (Tl) >Area (T2)  *hen 
begin 
C (NT) : =0 

comment  non-compact  strip 
XBeg (NT) :«  XBeg (Tl) ; 

YEeg (NT) :*  YBeg (Tl) ; 

XEnd (NT) : = XEnd(Tl) ; 

YEnd (NT) :=  YEnd (Tl) ; 
Area(NT):*  Area(Tl); 

LSon (NT) : = PAInt 
( LSon (Tl) ,T2) ; 

RSon (NT) : * PAInt 


Note  that  in  the  case  of  areas  that 
intersect  in  a way  that  fragments  their 
boundaries,  the  order  of  the  segments 
will  not  be  preserved  by  the 
intersection  procedure.  (Until  this 
point  we  were  guaranteed  that  strips  in 
the  tree  would  be  ordered  according  to 
•he  arc  length  of  their  underlying 
polylines).  However,  all  the  other 
properties  of  the  representation  are 
preserved . 


(RSon (Tl) ,T2) ; 
ret  urn (NT) : 
end 

else  commen*  Area(Tl)< 

Area (T2) 

Return 

(PAInt (PAInt (Tl .LSon (T2) ) ,RSOn ( 
T2)  ) ) ; 
end; 


4.3  Intersecting  Two  Areas 

The  problem  of  intersecting  two 
areas  can  be  efficiently  carried  out 
using  their  strip  ' ree  represen* at  ions . 
The  me* hod  is  to  decompose  • he  problem 
in*o  two  polyline  area  intersec* ion 
problems  (refer  *o  Figure  10). 


Figure  10:  Decomposition  of  Area-Area 
lntersec * ions 

If  we  treat  the  boundary  of  A1  as 
representing  a polyline  instead  of 
representing  an  area  and  intersect  its 
strip  tree  with  the  s*rip  tree 
representing  A2  the  lowest  level  result 
is  shown  by  the  thick  lines  in  Figure 
10a.  If  we  reverse  the  roles  of  the  two 
strip  trees  the  result  is  given  by  the 
thick  lines  in  Figure  10b.  The  union  of 
these  two  strip  trees  (see  Section  3.4) 
is  the  answer  we  want!  Thus  we  can 
write  the  area-area  intersection 
procedure  in  terms  of  strips  as  follows: 

Algorithm  A8:  Area-Area  Intersection 

reference  procedure  AreaArealnt  (T1,T2) 

begin 

return  (Union  (PolyArealnt 
(Tl ,T2) ) , (PolyArealnt 

( T 2 , T 1 ) , ) j 

t end; 

| where  Union  is  a procedure  that 

accomplishes  the  construction  described 


4.4  The  Union  Operation 

The  union  operations  are  slightly 
simpler  than  the  intersection  operation. 
For  the  union  of  a P-tree  and  a P-tree 
we  use  a construction  similar  to  the 
digitization  methods  for  disconncted  Ps. 
The  result  is  a P-tree.  Note  that,  the 
union  operation  for  strip  trees  is  not 
commuta*ive.  Also,  we  do  not  define  a 
union  operation  for  a strip  tree 
representing  a polyline  and  a strip  tree 
representing  a region.  The  union  of  two 
region  strip  trees  is  defined  and  is  a 
region  strip  tree.  If  these  two  strip 
• rees  do  not  intersect,  then  the  union 
is  straightforward  and  is  identical  to 
the  method  for  polylines.  However,  if 
the  contrary  is  true,  then  we  must  go  to 
•he  trouble  of  defining  a new  strip  tree 
that  represents  the  union  by  finding  the 
points  of  intersection  in  the  same  way 
as  was  done  for  region  strip  tree 
intersections. 

5.  Conclusions 

Strip  trees  provide  a powerful 
representation  for  polylines  and  areas. 
Current  work  is  directed  towards 
characterizing  their  computational 
complexity  more  precisely  but  it  can 
already  be  shown  that  the  representation 
is  superior  to  its  competitors.  The 
main  drawback  is  that  there  is  a large 
overhead  in  terms  of  space.  If  n is  the 
required  space  to  represent  a polyline 
then  its  strip  tree  will  take  about  4n 
space  units.  Also  the  creation  of  a 
strip  tree  is  a laborious  process, 
requiring  0 (n  log  n)  time  units. 
However,  neither  of  these  drawbacks  are 
thought  to  be  important  in  the  use  of 
this  representation  for  geographical 
data  bases. 


The  represen* at  ion  defines  s*rip 
segments  as  primitives  to  cover  subsets 
of  the  line  after  [Peucker,  1976).  Our 
organization  of  these  segmen* s into  a 
tree  may  be  viewed  as  a particular  case 
of  a general  strategy  of  dividing 
features  up  and  covering  * hem  with 
arbitrary  shapes  such  as  depicted  by 
Figure  10.  Other  at*emp*s  in  this  class 
have  been  tried  by  IBarrow  e*  al.,1977; 
Bur*on,  1977;  Tanimo*o,  1975],  bu*  *hey 
do  not  capture  *he  no* ions  of 
orientation  and  resolution  anywhere 
nearly  as  precisely  as  strip  segmen*s, 
and  do  not  have  the  union  and 
intersection  properties. 
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Figure  1.  Map  features  displayed  at  various 
resolutionsusing  the  hierarchical  structure. 
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Figure  4.  A portion  of  an  encoding  using  m-ary  trees. 


Figure  5:  Two  of  Many  Possible  Geometries  When  Testing 
the  Distance  of  a Point  from  a P. 


Figure  6:  Different  Ways  Strips  Can  Intersect 
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ABSTRACT 

A new  approach  to  deriving  three-dimensional  surface 
orientation  from  image  textural  properties  is  described. 
Introduced  is  a new  representational  and  computational  tool, 
the  normalized  textural  properly  map,  which  unites  and 
exploits  a large  class  of  low-level  image  heuristics.  An 
example  Is  given  of  an  application  of  the  paradigm  to  an 
(abstract)  textured  image.  Some  comments  reflect  on  the 
relation  of  this  work  to  existing  work  on  shape. 


INTRODUCTION 

One  central  task  ot  image  understanding  is  the 

recovery  of  three-dimensional  scene  information  from  the 
two-dimensional  perspective  transformation  that  is  the 
image.  Generally  what  is  sought  is  the  location  and 

structure  of  the  small  number  of  objects  that  comprise  the 
typical  scene.  However,  a static,  monocular  image  presents 
much  less  fo  (he  eye.  Objects  are  visible  onfy  as  non- 
occluded  opaque  surfaces.  Surfaces  are  distinguished  by 
local  surface  properties  (shading,  color,  texture)  subject  to 
deformations  due  to  variations  in  slant,  lighting  conditions, 
relative  distance,  and  other  influences.  Recovering  object 
descriptions  from  such  a melange  of  distortions  is  a 

formidable  task.  This  paper  considers  only  one  surface 
property  (texture)  subject  fo  only  a small  set  of  imaging 
phenomena  (mainly,  the  effects  of  surface  orientation). 

The  approach  is  necessarily  heuristic.  Given  the 
Infinite  number  of  scenes  that  can  produce  identical  images, 
reversing  the  projection--going  from  distorted  surface 
properties  to  three-dimensional  surface  shapes— requires 
the  judicious  use  of  assumptions  about  surfaces  and  the 
imaging  process.  The  first  concern  of  this  paper  is  what 
exactly  needs  to  be  assumed  in  order  to  proceed  with  the 
mathematics  of  analysis.  The  second,  related  one  is  the 
constraint  equations  that  result. 

Most  of  this  paper  discusses  a new  computational 
paradigm  that  integrates  textural  properties,  Image 
heuristics,  and  a representation  of  surface  orientation  in  a 
way  that  allows  the  quantification  of  constraints  on  local 
surface  orientations.  The  paper  also  presents,  by  example, 
the  paradigm  at  work.  The  class  of  relatively  model-free 
constraint  equations  that  result  from  it  are  applied  to  an 
(abstract)  textured  scene.  Discussion  follows.  Further 
results  and  discussion,  as  well  as  much  more  supporting 
mathematics,  can  be  found  in  the  author's  forthcoming  thesis 
[Kender,  1979} 


THE  PARADIGM 

In  the  image  forming  process,  surfaces  are 
perspectively  projected  onto  two-dimensional  regions  of  the 
imaging  retina.  The  images  of  the  texture  objects  which 
define  the  surface  are  distorted  by  local  surface  orientation, 
relative  surface  distance,  and  the  characteristics  of  the 
imaging  device.  The  task  of  the  visual  processor  is  to 
deconvotute  these  effects,  but  it  is  only  the  effects  Of  the 
last  influence,  the  camera  parameters,  that  are  usually 
known  a priori.  Recovery  of  the  others  depends  on 
simplifying  assumptions,  based  ultimately  on  intuitive 
concepts  about  the  physical  world.  These  are  the  notions  of 
texture  "regularity*  (both  in  texture  object,  and  in  texture 
object  placement  with  respect  to  the  surface),  and  of 
surface  opacity  and  "smoothness".  The  general  paradigm 
exploits  each  of  these  assumptions  in  the  creation  and 
refinement  of  the  analytic  framework  described  below. 

The  fundamental  conceptual  and  representational  tool 
Is  the  new  one  of  normalized  lex) oral  property  map. 
Intuitively,  this  map  relates  a given  two-dimensional  image 
texel  to  the  small  class  of  three-dimensional  texture  objects 
which  may  have  been  its  source  in  the  scene.  More 
precisely,  it  is  a way  of  "deprojecting"  the  effects  that 
surface  orientation  has  on  primitive  textural  properties  such 
as  slope  in  the  image,  length  of  major  axis  of  elongation,  etc. 
The  map  summarizes  the  answers  to  the  following  two- 
dimensional  family  of  questions:  if  the  microplane 
underlying  this  texture  object  were  inclined  at  this  specified 
three-space  orientation,  what  would  the  textural  property 
had  to  have  been  in  the  external  scene  in  order  to  observe 
the  given  textural  property  in  the  image?  Mathematically,  a 
normalized  textural  property  map  is  a function,  whose 
arguments  are  the  hypothesized  surface  orientation 
descriptors,  and  whose  value  is  the  deprojected,  "in  the 
scene"  normalized  textural  property.  Graphically,  the  map 
can  be  represented  as  a surface  in  a three-dimensional 
coordinate  system,  with  the  two-dimensional  representation 
of  surface  orientation  providing  the  underlying  grid,  and  the 
normalized  textural  property  providing  the  vertical  axis.  It 
can  also  be  represented  as  a contour  map  in  the  usual  way. 
The  contours  trace  out  those  microplane  orientations  giving 
rise  to  the  same  normalized  property. 


THE  NORMALIZED  TEXTURAL  PROPERTY  MAP 

The  normalized  textural  property  map  is  composed  of 
four  basic  ingredients.  First,  a representation  for 
microplane  orientation  Is  chosen.  The  gradient  space 
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[Huffman,  1971]  is  one  candidate;  it  represents  surface  slant  texture  Is  'painted"),  or  by  extending  above  it  (it  is 

by  assuming  that  all  microplanes  satisfy  the  aquation:  'pointed"),  but  it  can  not  do  both.  (Examples  of  'pointed* 

textures  are  forests  of  trees,  or  metropolitan  downtowns). 
-Z  - px  ♦ qy  ♦ e This  Is,  of  course,  a continuum,  extending  from  tangent 

relations  lo  normal  ones,  bul  the  analysis  is  very  different 
This  assumes  a rectangular  coordinate  system  that  aligns  the  at  either  extreme.  The  examples  fhaf  follow  will  assume 

Z axis  with  the  line  of  sight.  Surface  orientation  is  then  assume  tangency,  the  more  general  case, 

represented  by 

These  four  ingredients  come  together  in  the  following 
(-Az/Ax,  -Az/Ay)  - (p,  q)  way.  The  surface  orientation  representation  provides  the 

coordinate  system  and  the  hypothesized  scene  planes.  The 
which  is  the  gradient,  hence  the  name  of  the  space.  Note,  choice  of  textural  property  defines  the  lexel  class  that  is  to 

however,  that  other  representations  for  surface  slope  are  be  deprojected.  The  camera  model  provides  the 

possible.  For  example,  it  is  often  the  case  that  representing  deprojecting  function.  And  the  assumption  of  texture 

surface  orientations  by  a spherical  coordinate  system  (the  ob)ect-to-surface  relationship  provides  further  information 

'Gaussian  sphere*)  yields  simpler  normalized  maps.  as  to  how  the  various  components  of  the  lexel  are  to  be 

deprojected  with  respect  to  each  other. 

A second  important  constituent  of  the  contour  map  is 

the  considerations  of  the  imaging  process.  If  the  image  is  As  an  example,  consider  the  normalized  map 

considered  to  have  resulted  from  a perspective  consisting  of  the  gradient  space,  the  property  of  length, 

transformation,  the  focal  length  and  the  central  focus  point  orthographic  projection,  and  tangency.  Let  the  length 

of  the  imaging  apparatus  must  be  known.  This  complicates  element  in  the  image  extend  from  (0,  0)  to  (L,  0)  in  the 

the  analysis  considerably.  However,  if  the  image  has  arisen  image;  its  length  is  L.  Assuming  tangency  allows  the 

from  an  orthographic  imaging  system,  no  imaging  parameters  deprojection  of  both  ends  onto  the  plane  -z  • px  ♦ qy  ♦ c 

are  required.  The  focal  length  becomes  infinitely  large,  and  as  follows:  (0,  0,  -c)  to  (L,  0,  -(pL  ♦ c)).  The  deprojected 

every  point  in  the  image  can  be  considered  the  central  length  element  has  normalized  length 
focus  point.  This  simplifies  the  deprojection:  the  image 

point  (x,  y),  deprojected  onto  the  plane  -z  « px  + qy  ♦ c,  is  Ln  - L a sqrt  (1  ♦ p2) 

(x,  y,  -(px  * qy  ♦ c». 

The  contour  form  of  the  map  is  shown  in  Fig.  1.  Due 
In  terms  ol  the  paradigm,  orthography  also  to  orthography,  this  result  holds  for  all  equally  long  length 

guarantees  that  the  normalized  textural  property  map  is  elements  oriented  parallel  to  the  one  given  in  the  example, 

independent  of  the  position  of  the  texel  in  the  image.  This  Further,  due  to  the  gradient  space,  the  result  for  any  other 

is  both  an  advantage  and  a disadvantage;  what  is  lost  is  the  slope  is  simply  a rotation  of  the  normalized  map  to  that 

Often  powerful  constraints  perspective  deformation  imposes  corresponding  direction;  see  Fig.  2.  A catalogue  of  many 

on  the  determination  of  surface  slant.  In  fact,  it  can  be  such  maps  for  varying  ingredients  appears  in  the  author’s 

shown  that  under  orthography  some  textural  properties  thesis.  For  example,  the  map  tor  density  under  the  same 

provide  no  surface  orientation  information  at  all,  although  assumptions  as  the  above  illustration,  is 
they  are  very  useful— and  even  elegant— under  perspective 

(for  example,  edge  slope  [Kender,  1978]).  The  remainder  of  Dn  - 0 / sqrt  (1  ♦ p2  ♦ q2) 

this  paper  will  concern  only  orthographic  projection,  for 

simplicity's  sake.  which,  as  expected,  is  rolationally  symmetric. 

The  third  component  in  creating  the  normalized  map  is 
the  choice  of  textural  property.  The  following  properties,  THE  MAP  REFINED 
among  others,  are  easily  derivable  in  the  image:  texel  slope, 

texel  length  (of  a major  axis  of  elongation,  say,  or  of  a line  The  paradigm  continues  with  the  two  following 

element),  angle  (between  texel  slopes,  or  the  T"  and  "V"  refinement  steps.  The  normalized  textural  property  map 

joints  formed  from  adjoining  larger  texels  [Maleson,  1977]),  gives  no  information  about  surface  orientation  by  itself;  it 

areal  measures  such  as  area  and  density  (the  count  of  only  expresses  relationships.  It  is  here  that  the  second  and 

distinguished  events,  such  as  edges  [Rosenfeld  et.  al.,  1970]  third  heuristics  about  the  physical  world  are  invoked.  They 

or  relative  extrema  [Mitchell  et.  at.,  1977],  per  unit  of  image  are  the  continuity  of  texture  objects  ("regularity"),  and  the 

area),  and  other  combinations  of  the  above,  such  as  continuity  of  local  surface  orientations  ("smoothness").  In 

eccentricity  (the  ratio  of  two  special  lengths,  the  major  and  the  extreme,  these  heuristics  state  that  all  texture  objects  In 

minor  axes  [Stevens,  1979]),  or  skewed  symmetry  (the  angle  three-space  are  identical  (that  is,  the  texture  approaches  a 

between  two  special  lines,  the  symmetry  axis  and  the  "structural"  texture),  and  that  all  local  surfaces  are  planar, 

transverse  axis  [Kanade  el.  al.,  1979]).  Each  such  property  Although  not  always  true,  assuming  the  extremes  In  small 

generates  its  own  class  of  normalized  map.  In  general,  the  neighborhoods  does  allow  the  following  mathematical 

maps  for  the  "higher"  properties  such  as  eccentricity  and  constraints  on  surface  orientation, 

shewed  symmetry  are  special  cases  of  simpler  ones. 

Continuity  of  texture  object  asserts  that  neighboring 
Lastly,  to  create  the  map  requires  the  invocation  of  a texels  must  have  the  same  normalized  textural  property 

general  physical  world  assumption:  the  uniqueness  of  the  values.  All  other  influences  being  equal,  their  maps  would 

relationship  of  the  texture  object  lo  the  microplane.  That  Is,  be  identical.  Continuity  of  local  surface  orientation  similarly 

a given  textural  property--say,  length— defines  the  surface  asserts  that,  locally,  observed  textural  properties  never 

associated  with  it  by  either  lying  wholly  within  It  (the  differ  because  of  a variations  In  inclination.  Wist  accounts 


136 


for  dissimilarities  is  Ihe  one  degree  ol  freedom  led.  And 
that  is  the  relative  placement  ol  the  texture  object  with 
respect  to  the  surface,  within  the  limits  of  the  previous 
assumption  concerning  texture  object-surface  relation.  If 
the  assumption  was  that  a texture  object  lie  in  the  plane,  it 
can  still  rotate  freely  in  it.  Even  if  the  assumption  were, 
say,  that  it  was  to  be  within  n°  of  Ihe  surface  normal,  many 
placements  lying  with  the  implied  cone  are  possible. 

Having  assumed  away  any  differences  arising  from 
variations  in  local  texture  objects  and  local  surface 
orientations,  the  normalized  textural  properly  maps  that 
result  from  adjacent  texels  differ  only  because  of  placement 
effects.  This  last  is  removed  by  intersecting  the  two  maps 
and  finding  those  microptane  orientations  that  have  the  same 
normalized  property  value.  In  effect,  this  mathematical 
operation  of  finding  all  (p,  q)  so  that  Pj(p,  q)  " P2<P,  <t) 
states  that  only  a small  set  of  surface  orientations  in  the 
scene  could  have  distorted  both  texture  objects 
simultaneously  in  such  a way  that  their  images  can  be 
deprojected  into  equal  normalized  values. 

As  an  example,  take  as  texels  two  length  elements  as 
In  Figs.  1 and  2.  Assume  the  oilier  constraints  as  given  with 
the  figures.  It  the  length  elements  appear  "nearby"  in  an 
image  (as  in  Fig.  3,  a very  small  section  of  a texture),  create 
and  intersect  the  surfaces  of  their  normalized  textural 
property  maps.  The  result  (in  Fig.  A),  is  the  Ihe  set  of 
possible  local  surface  orientations  that  could  have  distorted 
the  assumed  three-dimensional  regularity  (here,  equal 
length)  into  the  given  image. 

The  analysis,  ol  course,  can  be  done  strictly 
mathematically.  In  the  case  of  two  equal  length  elements 
the  constraint  equations,  in  general,  describe  an  hyperbola. 
In  the  above  example,  it  is  specified  by 

Lj  » sqrt  (1  ♦ p2)  - L|  * sqrt  (1  tq2) 

where  l|  are  image  measurements.  In  general,  this  step  of 
the  paradigm  ends  with  a one -dimensional  family  of  plausible 
surface  orientations.  It  can  be  further  refined  by 
intersecting  it  with  the  normalized  map  of  a third  nearby 
texel  of  the  same  class.  However,  often  a more  useful  type 
of  refinement  is  possible 


THE  S ECONO  REFINEMENT 

Suppose  the  texture  is  rich  enough  so  that  at  least 
one  Other  different  textural  property  can  be  discovered  and 
assumed  to  be  "regular".  Executing  the  first  two  steps  of 
the  paradigm  separately  for  both  textural  properties  will 
usually  result  in  one-dimensional  locus  of  allowable  surface 
orientations  in  both  maps.  Again  invoking  the  assumptions 
of  continuity  of  texture  object  and  surface  orientation 
allows  these  maps  to  be  intersected  (or  their  constraint 
equations  to  be  mutually  solved).  That  is,  although  the 
textural  properties  (and  their  maps)  may  be  different,  the 
underlying  scenic  texture  objects  that  account  for  both 
properties  are  assumed  to  he  physically  identical.  Under 
orthography,  the  analysis  usually  ends  at  this  stage  with  a 
Meeker  pair  of  surface  orientations  (that  is,  both  (p,  q)  and 
f-p,  -q».  Choosing  one  or  the  other  of  the  pair  requires  the 
invocation  of  the  last  assumption— the  uniqueness  of  surface 
orientation— in  whatever  subsequent  integrative  or 


relaxation  method  would  be  used  to  recover  actual  surface 
depth. 

In  the  example  of  Fig.  3,  assume  that  the  two  texture 
objects  in  three-space  are  also  mutually  perpendicular  (that 
is,  their  placement  orientations  are  equally  offset  from  the 
axes  of  any  local  coordinate  system  in  the  microplane). 
After  the  first  refinement  step,  the  paradigm  generates  the 
constraint  equation  pq  «■  0,  and  a normalized  map  consisting 
of  the  p and  q axes  themselves.  Fig.  5 shows  these 
constraints  intersected  with  the  constraints  of  Fig.  A.  The 
resulting  Necker  pair  of  surface  orientations  indicate  the 
two  possible  surface  orientations  for  the  microplane  formed 
in  Fig.  3. 


SUMMARY  AND  COMMENT 

The  three  steps  of  the  surface  orientalion-from-image 
texture  paradigm  are  as  follows.  Step  1:  Choose  a textural 
property,  find  a texel  possessing  it,  and  generate  its 
normalized  textural  properly  map.  Step  2:  Choose  a similar 
nearby  texel  from  the  same  texture,  apply  Step  1 to  it,  then 
intersect  the  normalize  maps:  call  the  resulting  map  PI. 
Step  3:  Choose  a second  fextural  property,  apply  steps  I 
and  2 to  get  map  P2;  then  intersect  maps  PI  and  P2.  Except 
for  certain  degenerate  cases,  this  last  map  is  a zero- 
dimensional set  of  possible  surface  orientations  for  the 
microplane  defined  by  the  local  texels.  Under  perspective, 
the  result  is  often  a unique  value. 

The  normalized  textural  property  map  is  a close 
relative  to  Ihe  reflectance  map  of  Horn  [Horn,  1977). 
Reflectance  can  be  considered  as  a very  primitive  textural 
property.  Its  image  counterpart,!  pixel  brightness,  is  the 
most  primitive  texel  possible  (a  pixel).  There  is  one 
important  difference  between  normalized  maps  and 
reflectance  maps,  however.  A reflectance  map  assumes 
knowledge  of  the  (constant)  illuminant  strength.  In  the 
corresponding  case  of  brightness  as  textural  property,  the 
normalized  map  represents  what  Ihe  unknown  illuminant 
strength  would  have  had  to  have  been;  this  varies  for  each 
surface  orientation. 

The  paradigm  given  above  can  be  shown  to  subsume 
the  method  of  photometric  stereo  (Woodham,  1978],  where 
the  need  for  differing  tpxturat  properties  is  met  by  varying 
the  scene  lighting  conditions.  Additionally,  the  paradigm  can 
show  that  density  behaves  identically  to  one  restricted  case 
of  reflectance.  It  can  therefore  add  theoretical  weight  to 
the  heuristic  method  of  blurring  lexels  in  order  to  treat 
textured  regions  as  shaded  regions. 

A comment  on  a different  level:  much  of  the  paradigm 
requires  the  use  of  heuristic  decisions,  such  as  the 
assumptions  that  "nearby"  textural  properties  are,  in  fact, 
identical.  Such  assumptions  can  be  expressed  as  Ihe  meta- 
heuristic: "apparent  » true."  This  finds  its  expression  in 
the  paradigm  by  such  judgements  as  "nearly  equal  lengths 
are  equal  in  three-space,  but  have  been  foreshortened," 
"nearly  parallel  lines  are  parallel  in  three-space,  but  have 
been  perspectively  converged,"  etc.  The  mela-heuristic  is  a 
restatement  of  the  general  view  hypothesis,  in  a form 
exploitable  by  this  new  computational  method. 
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Lastly,  the  paradigm  ran  throw  a little  light  on  the 
meaning  of  that  obscure  object  of  this  study,  "texture."  It 
implies  that  image  elements  farm  a texture  if  they  are 
considered  to  define  a surface  and  are  locally  identical  in 
terms  of  primitive  properties.  Or,  restating,  a texture  is  a 
collection  of  image  elements  whose  regularity  can  be 
exploited  in  a computational  way  to  abstract  the  orientation 
Of  their  associated  surface  Thus,  a group  o I image 
elements  become  a "texture"  when  their  properties  become 
regular  enough  to  be  used  to  discover  shape. 
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Fig.  2.  The  normalized  textural  property  map  under  the 
conditions  of  Fig.  1,  except  that  the  unit  length  element  it 
oriented  at  45°. 
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Fig.  3.  A simple  texture  composed  ot  equal  length  texture 
objects  seen  in  orthographic  pe.  spective. 
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Fig.  4.  The  execution  of  the  first  two  steps  of  the  paradigm 
paradigm  on  the  two  labeled  texels  of  Tig.  3.  The  two 
normalized  textural  property  maps  intersect  in  a hyperbola. 
This  constrains  the  possible  surface  orientations 
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Fig.  5.  The  execution  of  the  third  step  of  the  paradigm.  The 
hyperbola  of  Fig.  4 is  intersected  with  the  degenerate 
hyperbola  consisting  of  the  p and  q axes.  The  latter  was 
Obtained  by  executing  the  paradigm  under  the  additional 
assumption  that  the  lengths  of  Fig.  3 are  perpendicular  In 
three-space.  A Nocker  pair  of  possible  surface  orientations 
results. 
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ABSTRACT 

Our  recent  work  in  navigation  using  passively 
sensed  imagery  has  concerned  the  extension  of  the 
role  of  the  stereo  system,  the  use  of  the  Night 
Vision  Laboratory  terrain  model  Imagery  for  image 
velocity  sensor  experiments , and  the  study  of 
methods  for  dealing  with  vehicle  structural  flex- 
ure effects  on  the  stereo  image  pair. 

INTRODUCTION 

The  concept  of  using  passively  sensed  Imagery 
for  navigation  of  a low-flying,  slow-speed  vehicle 
was  described  in  Ref.  1.  Briefly,  an  image  velo- 
city sensor  (IVS)  compares  sequential  image  frames 
to  obtain  the  velocity  to  altitude  ratio,  V/H, 
while  a stereo  system  Ref.  2 computes  H,  the  alti- 
tude of  the  vehicle  above  the  ground,  so  that  the 
velocity,  V,  can  be  obtained.  Velocity  is  also 
confuted  using  conventional  instruments,  and  an 
estimate  of  wind  velocity.  Dead  reckoning  navi- 
gation, based  on  a best  estimate  of  wind  velocity, 
is  updated  periodically  with  positional  correc- 
tions made  using  the  passively  sensed  images. 

This  paper  describes  recent  activity  in  the 
following  topic  areas,  (1)  extending  the  role  of 
the  stereo  system,  (2)  the  Night  Vision  Labora- 
tory terrain  model  and  associated  imagery,  (3)  the 
IVS  experiments,  and  (4)  recent  stero  studies. 

EXTENDING  THE  ROLE  OF  THE  STEREO  SYSTEM 

The  original  role  of  the  stereo  system  was  to 
determine  the  altitude  H,  so  that  the  velocity  V 
could  be  obtained  from  the  V/H  measurement  obtain- 
ed by  the  image  velocity  sensor.  However,  as  the 
characteristics  of  the  stereo  system  were  examined 
it  was  noted  that  there  was  potential  for  an  ex- 
panded role  for  it.  In  particular,  the  system 
could  provide  a terrain  map  in  addition  to  the 
V/H  information,  as  discussed  below. 

TERRAIN  IMAGE  ASPECTS 

The  stereo  system  is  capable  of  producing  a 
2-D  terrain  image  in  which  relative  elevations  of 
image  features  are  given.  Thus,  we  can  obtain  an 
image  whose  pixel  values  are  some  fixed  positive 
or  negative  bias  from  the  actual  elevation  above 
sea  level.  He  have  been  considering  ways  of 
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matching  this  type  of  image  against  a data  base 
of  terrain  images  in  order  to  obtain  positional 
corrections  for  the  vehicle. 

The  main  advantage  of  this  approach  for  posi- 
tion verification  is  the  robustness  of  the  de- 
rived terrain  image  compared  to  images  based  on 
scene  intensity  values.  (The  derived  terrain  image 
should  be  insensitive  to  sensor  characteristics 
and  lighting  conditions,  unlike  the  case  of  com- 
paring sensed  intensity  image  with  reference 
intensity  images). 

The  following  problem  areas  are  therefore 
currently  receiving  attention: 

1 - Data  base  construction.  Determination  of  what 
features  can  be  extracted  from  a topographic  image, 
and  the  data  compression  and  smoothing  techniques 
available. 

2 - Topographic  image  determination.  At  present, 
to  solve  for  the  camera  model  one  finds  correspond- 
ing regions  in  a pair  of  images.  He  are  examining 
whether  feature-based  correspondence  offers  any 
advantages. 

3 - Terrain  matching.  There  are  several  approaches 
to  terrain  matching  being  considered.  In  addition 
to  the  1-D  TERCOM  approach,  there  are  the  follow- 
ing 2-D  alternatives: 

• area  correlation  based  on  terrain  elevation 
values  rather  than  seene  intensity. 

• spatial  matching  of  terrain  features,  using 
techniques  such  as  chamfer  matching,  array 
relaxation,  minimal  spanning  tree  matching, 
or  Voronoi  decompositions. 

• symbolic  description  matching,  using  tech- 
niques such  as  & algorithms,  graph  relax- 
ation, and  dynamic  programing. 

USING  STEREO  FOR  V/H  MEASUREMENT 

The  original  plan  for  measuring  velocity  from 
imagery  called  for  height  (H)  to  be  measured  by 
high -re solution  stereo  and  the  ratio  of  velocity 
to  height  (V/H)  to  be  measured  by  the  IVS.  How- 
ever, on  a vehicle  with  no  inertial  reference 
system,  the  IVS  is  subject  to  errors  caused  by 
changes  in  attitude  of  the  vehicle.  A change  in 
attitude  causes  a shift  in  the  image  which  will 
be  confused  with  a shift  caused  by  linear  motion, 
causing  a false  reading  of  V/H  to  be  obtained. 
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Therefore,  a different  way  of  obtaining  the 
V/H  information  has  been  investigated.  This  method 
uses  stereo  with  a baseline  produced  by  vehicle 
motion,  using  a single  camera.  The  stereo  camera 
model  solver  can  compute  the  change  in  vehicle 
attitude.  Then  the  result  of  the  stereo  computa- 
tion is  the  ratio  of  height  to  the  length  of  the 
stereo  baseline  (4),  that  is  H/R.  Combining  this 
with  H from  high-resolution  stereo  as  before 
produces  R,  that  is,  the  distance  that  the  ve- 
hicle moved  between  pictures.  Furthermore,  the 
stereo  camera  model  gives  the  direction  of  this 
motion  relative  to  the  camera  coordinate  system, 

80  that  the  vector  motion  is  known.  Dividing 
this  by  the  time  interval  produces  velocity. 

If  the  camera-cluster  method  of  calibrating 
the  camera  model  for  the  high  resolution  stereo 
is  used,  as  described  below,  one  of  these 
same  camera  clusters  can  be  used  for  the  stereo 
needed  for  the  H/R  computation.  The  exposures 
would  have  to  be  very  close  in  time,  say  about  ten 
feet  of  vehicle  motion,  so  that  some  of  the  same 
points  will  still  be  in  the  fields  of  view  of  the 
narrow-angle  cameras.  The  camera  model  can  be 
very  accurately  calibrated  because  of  the  high- 
resolution  of  these  cameras  and  the  wide  field  of 
view  that  they  collectively  span,  so  that  an  accu- 
rate H/R  can  be  computed  with  the  short  ten-foot 
baseline.  These  considerations  are  the  same  as 
those  discussed  below  in  connection  with 
obtaining  H by  this  method. 

No  matter  what  method  of  calibrating  the  high- 
resolution  stereo  is  used,  a single  wide-angle 
camera  can  be  used  to  obtain  the  stereo  for  the 
H/R  measurement.  This  could  be  the  same  camera 
used  to  obtain  stereo  for  map-matching  or  to  ob- 
tain non-stereo  information.  Because  of  the 
poorer  angular  resolution  of  the  wide-angle  camera, 
the  baseline  would  have  to  be  longer  (a  few  hundred 
feet).  However,  this  is  no  problem  as  long  as 
there  is  sufficient  overlap  between  views  and 
sufficiently  low  distortion,  because  the  baseline 
is  generated  by  vehicle  motion.  The  accuracy  that 
can  be  obtained  by  this  method  is  given  in 
Appendix  A. 

IMAGERY  FROM  THE  NIGHT  VISION 
LABORATORY  TERRAIN  MODEL 

The  Night  Vision  Terrain  model  is  a physical 
model  of  terrain  at  a scale  of  400:1.  A detailed 
topographic  map  of  the  model  is  available  from 
NVL,  Ref.  4.  A gantry  crane  arrangement  traveling 
over  the  model  is  used  to  obtain  video  Imagery. 
Although  at  present,  movement  of  the  crane  and  the 
camera  orientations  are  set  manually,  a future 
enhancement  will  allow  both  the  gantry  and  the 
camera  to  be  moved  under  computer  control. 

This  model  offers  a source  of  terrain  Imagery 
controllable  with  respect  to: 

• flight  path 

• illumination 

a camera  orientation 


• type  of  camera 

• selection  of  terrain  type  (cultural  or 

rural  features) 

Selected  frames  from  a video  tape  of  several 
passes  over  the  terrain  map  have  been  digitized 
and  used  for  image  velocity  sensor  experiments 
(see  below)  and  will  be  used  for  stereo  experi- 
ments to  derive  topographic  images.  Two  typical 
scenes  are  shown  in  Figures  1 and  2,  and  the  asso- 
ciated topographic  map  for  the  region  is  shown  in 
Fig.  3.  Initial  stereo  results  using  a NVL  image 
pair  are  given  in  Appendix  B. 

IMAGE  VELOCITY  SENSOR  EXPERIMENTS 

A series  of  experiments  using  the  image 
velocity  sensor  on  the  region  shown  in  Fig.  1 were 
conducted.  A pair  of  128  x 128  regions  were 
extracted  from  two  512  x 512  frames  and  processed 
using  the  special  IVS  processor. 

It  was  found  that  the  camera  was  forward 
pointing  and  not  aimed  along  the  flight  path. 

This  resulted  in  large  perspective  effects  in  the 
images;  from  frame  to  frame,  objects  at  the  top 
of  an  image  moved  less  than  those  at  the  bottom. 

In  add* "ion,  objects  on  the  left  side  of  the  image 
moved  parallel  to  the  direction  of  motion,  while 
those  on  the  right  side  translated  from  left  to 
right.  As  shown  in  Fig.  4,  the  IVS  displacements 
depended  on  the  portion  of  the  image  from  which 
the  128  x 128  subregion  was  selected.  This  effect 
occurred  even  for  Images  having  807.  overlap. 

Thus,  we  found  that  if  the  vehicle  p h and 
roll  are  large  enough  to  cause  perspective  < .fects, 
the  IVS  should  use  a 128  x 128  image  covering  a 
large  field  of  view  to  compensate  somewhat  for 
the  offset  errors.  This  was  verified  by  compress- 
ing the  512  x 512  image  (by  sampling  to  128  x 128) 
and  running  the  experiment  again. 

UCTURAL  FLEXURE  COMPENSATION 

In  ord'r  to  determine  vehicle  altitude  using 
stereo,  we  . lnnot  use  two  looks  of  a single  sensor 
because  we  do  not  know  the  baseline  to  be  traveled, 
istead,  we  must  use  f cameras,  mounted  on  oppo- 
:t-  wings  or  mounted  sore  and  the  aft  on  the 
fuselage.  If  we  arc  dealing  wlt.i  flight  altitudes 
on  the  order  of  1000  feet  and  cameras  that  are  at 
least  10  feet  apart,  a IX  accurst,  in  altitude 
determination  requires  a resolution  of  0.1  mili- 
radians.  When  dealing  with  this  accuracy  in 
resolution,  the  relative  camera  orientation  change 
due  tc  ’ Mcle  structural  flexure  becoou  s Impor- 
tant. \_rious  techniques  for  measuring  the 
camera-relative  orientations  were  investigated, 
including  tLe  sensor  cluster  approach  shown  in 
Fig.  5. 

In  the  cluster  calibration  approach,  a 
cluster  of  sensors  having  a narrow  field  of  view 
is  used  at  each  position,  as  shown  in  Fig.  5. 

A series  of  experiments  was  made  for  the  various 
cluster  forms  shown  in  Fig.  5 , and  the  following 
table  shows  the  resulting  pan  accuracy  due  to  the 
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uncertainty  In  the  point  group  positions,  and  the 
pan  error  resulting  from  a scale  factor  error  of  1 
part  In  1000  In  the  point  spacings.  (With  single 
cameras,  this  would  correspond  to  an  error  In  one 
focal  length  of  1 part  In  1000).  The  values  are 
In  degrees. 


Point 

Pan  Error 

Pan  Error 

Group 

From  Points 

From  Scale 

3A 

.0109 

.058 

3B 

.0109 

.058 

4A 

.0091 

.055 

4B 

.0029 

.00057 

5 

.0029 

.00057 

The  roll  error  in  all  of  the  above  cases  was 
less  than  .02  degree  . In  Ref.  3 we  determined 
that  a maximum  pan  error  on  the  order  of  .0057 
degree  and  a maximum  roll  error  on  the  order  of 
.29  degree,  was  allowable.  He  can  therefore  see 
that  the  last  two  cases  are  satisfactory,  whereas 
the  first  three  are  unsatisfactory  with  the  assumed 
scale  factor  accuracy  and  would  still  be  question- 
able even  if  the  scale  were  known  exactly.  There- 
fore, it  seems  safe  to  assume  that  at  least  four 
points  will  be  needed  with  this  method.  Further- 
more, the  point  or  points  for  the  main  height 
measurement  should  be  directly  under  the  cameras. 
Since  Group  4B  does  not  contain  such  a point,  an 
extra  pair  of  cameras  would  seem  to  be  required 
with  it,  and  its  results  might  as  well  be  used  in 
the  camera  model  determination.  Therefore,  Group 
5 apparently  is  the  best  arrangement  to  use,  since 
it  contains  such  a point.  The  total  effect  on 
height  accuracy  of  the  camera  model  determined 
from  Group  5 is  5.0  feet.  Combined  with  the  point 
accuracy  of  10  feet,  this  produces  a total  accuracy 
of  11.2  feet. 
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APPENDIX  A 

ACCURACY  STUDIES  OF  STEREO 
V/H  MEASUREMENT 

A cnputer  simulation  was  done  to  test  the 
accuracy  of  calibrating  the  stereo  camera  model 
when  using  a single  wide  angle  camera.  The  follow- 
ing assumptions  are  made:  The  height  is  1000  feet, 
each  image  is  500  pixels  by  500  pixels,  and  the 


accuracy  of  the  stereo  matches  is  one  pixel.  A 
nominal  camera  model  was  assumed,  with  azimuth 
equal  to  90  degrees,  and  the  other  four  angles 
all  zero.  Several  cases  were  considered:  807. 
overlap  between  the  two  pictures,  with  five  points, 
one  in  the  center  and  one  50  pixels  from  each  cor- 
ner of  the  common  region,  as  shown  in  Figure  A-l; 
and  points  spaced  every  100  pixels  over  the  comnon 
region,  with  overlaps  of  807.,  60%,  and  40%,  pro- 
ducing 20,  15,  and  10  points,  respectively,  as 
shown  in  Figure  A-2.  Furthermore,  for  the  cases 
with  807.  overlap,  two  different  camera  fields  of 
view  were  assumed:  a focal  length  of  500  times  the 
pixel  spacing,  corresponding  to  a field  of  view  of 
53  degrees  by  53  degrees  (1000  feet  by  1000  feet 
on  the  ground);  and  a focal  length  of  250  times 
the  pixel  spacing,  corresponding  to  a field  of 
view  of  90  degrees  by  90  degrees  (2000  feet  by 
2000  feet  on  the  ground).  For  the  cases  with  607. 
and  40%  overlap,  only  the  500-pixel  focal  length 
was  used  (1000-foot  view  on  the  ground).  From  the 
above  assumptions,  it  can  be  derived  that  the 
length  of  the  stereo  baseline  (separation  of  the 
two  camera  positions)  has  the  following  values: 

400  feet  for  the  cases  with  the  250-pixel  focal 
length  (807.  overlap);  and  200  , 400,  or  600  feet  for 
the  cases  with  the  500-pixel  focal  length,  accord- 
ing to  whether  the  overlap  is  807.,  60%,  or  407., 
respectively. 

For  each  of  the  above  cases,  a camera  model 
solution  was  done  using  the  points  in  the  over- 
lapping region,  and  the  accuracy  of  the  resulting 
camera  model  was  propagated  into  the  value  of 
height  computed  using  this  camera  model,  for 
points  approximately  in  the  center  of  the  field  of 
view.  The  resulting  height  accuracy  (considering 
camera  model  effects  only)  for  each  of  these  cases 

is  shown  in  the  following  table.  „ , . 

Height 


Overlap 

View 

Baseline 

Accuracy 

Fig 

7. 

Points 

Ft 

Ft 

Ft 

1 

80 

5 

2000 

400 

10 

2 

80 

20 

2000 

400 

9 

1 

80 

5 

1000 

200 

42 

2 

80 

20 

1000 

200 

36 

2 

60 

15 

1000 

400 

28 

2 

40 

10 

1000 

600 

38 

Since  we  are  looking  for  an  accuracy  on  the 
order  of  17.  (10  feet  at  an  altitude  of  1000  feet), 
only  the  first  two  cases  in  the  above  table  are 
satisfactory.  (This  comnent  applies  only  to  the 
H/R  measurement  being  discussed  here.  For  stereo 
used  for  map  matching,  the  relative  accuracy 
rather  than  the  absolute  accuracy  is  more  impor- 
tant , and  the  camera  model  effects  are  not  very 
important,  since  they  tend  to  affect  nearby  points 
almost  equally.  The  accuracy  considering  only  in- 
dividual point  errors  is  10  feet  or  less  for  all 
of  the  above  cases).  However,  these  first  two 
cases  require  a wide  field  of  view  (90  degrees), 
and  thus  may  not  be  practical. 

In  order  to  achieve  the  desired  accuracy  with 
the  53-degree  field  of  view,  two  possibilities 
present  themselves.  One  is  that  the  accuracy  of 
the  points  may  be  better  than  the  assumed  one 
pixel.  With  a good  slgnal-to-noise  ratio  in  the 
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pictures  it  is  possible  to  achieve  a standard  dev- 
iation of  .28*  pixel  (or  even  better  if  interpo- 
lation is  used).  Multiplying  by  this  factor  shows 
that  the  fifth  case  in  the  table  would  be  less  than 
10  feet,  and  none  of  the  cases  wouH  be  very  far 
over  10  feet.  However,  experiments  must  be  done 
with  real  images  of  typical  terrain  to  determine 
whether  this  accuracy  can  be  consistently  obtained. 
The  other  possibility  is  to  use  a greater  number 
of  points.  With  a given  distribution  of  points, 
the  standard  deviation  of  the  result  is  inversely 
proportional  to  the  square  root  of  the  number  of 
points.  Therefore,  instead  of  the  third  case  in 
the  table,  suppose  that  each  of  the  five  points 
is  replaced  by  a cluster  of  25  points,  so  that 
there  are  125  points  in  all.  (This  could  be  done 
by  constraining  the  interest  operator  to  look  for 
points  only  in  the  100-by-100  areas  of  the  picture 
centered  on  these  points).  The  fact  tiiat  there  is 
some  spread  in  these  clusters  around  the  original 
points  would  produce  a slight  additional  improve- 
ment in  accuracy  (provided  that  the  cluster  is 
centered  on  the  original  point),  but  if  this  fact 
is  neglected,  the  accuracy  of  height  would  be  five 
times  as  good,  or  about  8 feet.  Again,  experiments 
must  be  done  with  real  images  or  typical  terrain 
to  determine  whether  the  interest,  operator  can 
reliably  find  good  points  this  closely  packed. 

An  important  additional  point  must  be  men- 
tioned. That  is,  the  above  analysis  assumes  that 
there  is  no  systematic  error  in  the  position  of 
the  points,  such  as  would  be  caused  by  camera 
distortion.  Appreciable  distortion  would  intro- 
duce a large  absolute  error  into  the  height  that 
would  make  this  method  completely  unsuitable  for 
the  purpose  being  discussed.  (Unless  the  distor- 
tion is  large,  the  method  would  still  be  useful 
for  other  stereo  purposes , however,  where  only 
relative  accuracy  is  important).  Thus  a solid- 
state  TV  camera  with  a lens  whos-*  distortion  is 
accurately  known  would  probably  be  required.  Any 
type  of  scanning  using  the  deflection  of  an  elec- 
tron beam  would  produce  variable  distortions  that 
would  probably  be  too  large. 

APPENDIX  B 

STEREO  RUN  USING  NIGHT  VISION 
LABORATORY  IMAGE  PAIR 

The  Stanford  stereo  system.  Ref.  2,  has  been 
used  to  compute  the  camera  models  on  a pair  of 
Night  Vision  Laboratory  images.  Of  particular 
significance  in  this  initial  effort  was  the  number 
of  points  that  the  "interest  operator"  in  the  pro- 
gram would  find,  and  whether  the  camera  parameters 
would  be  found  successfully. 

The  two  images  used  are  shown  in  Figs.  B-l, 
3-2,  and  the  points  of  interest  found  ars  shown  in 
Figs.  B-3,  B-4,  with  each  interest  point  marked 
with  an  arbitrary  code.  At  Least  50  points  of 
interest  were  found,  with  good  spatial  coverage, 
and  there  was  no  problem  with  ambiguous  points, 
in  Fig.  B-5  the  points  of  interest  for  each  image 
have  been  superimposed,  and  a line  drawr.  between 
some  of  them  to  indicate  the  motion  of  the  points. 
Note  that  points  of  the  left  side  of  Fig.  B-5 


have  moved  almost  vertically  (e.g.  points  2 and 
*);  while  the  points  on  the  right  (such  as  * and  R) 
also  exhibit  a movement  to  the  right. 

The  computed  camera  parameters  and  the  asso- 
ciated standard  deviations  are  shown  in  Table  B-l. 
Note  that  the  prn,  tilt,  and  roll  are  small,  as 
are  their  standard  deviations.  While  the  camera 
conditions  under  which  the  images  were  taken  are 
not  available,  the  results  for  elevation,  about 
65  degrees  down  from  the  horizontal,  and  azimuth, 
pointing  to  the  right  of  the  line  of  motion,  do 
agree  with  human  observation. 

The  experiments  will  be  continued  to  obtain 
the  terrain  elevation  image  for  this  pair  of 
images.  The  terrain  image  will  then  be  compared 
with  the  NVL  terrain  map.  Further  efforts  will 
concentrate  on  feature  analysis. 

Table  B-i 


Camera  Parameters  Obtained 


Angle 

(Degrees) 

Standard  Deviation 
(Degrees) 

Azimuth 

-14.8 

3.4  ’ 

Elevation 

84.7 

1.2 

Pan 

-0.3 

0.2 

Tilt 

-0.6 

0.4 

Roll 

0.3 

0.1 

Fig.  2 'Town'  Image  Obtained  From  the  NVL 
Terrain  Model 


the  NVL  Terrain  Model 
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6 Some  Sensor  Clusters  Investigated 


Fig.  3 Portion  of  Topographic  Map  of  NVL 
Terrain  Model 
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Right  Picture 


Fig.  A-l  Case  of  801  Overlap 


Left  Picture 


Fig.  6 Image  Velocity  Sensor  Offsets  Obtained 
from  Various  Portions  of  the  Image 


Right  Picture  — — 
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Fig.  A-2  Cases  of  401,601,  and  801  Overlap 


Fig.  3 Location  of  Sensor  Cluster  on  Vehicle 


Fig.  B-2  Second  linage  Used  in  Stereo  Experiment 


Fig.  B-L  First  Image  Used  in  Stereo  Experiment 


Fig.  B-3  Points  in  Image  i Found  bv  Interest 
Operator 


Fig.  B-4  Points  in  Image  2 Found  by  Interest 
Operator 


Fig.  B-5  Superimposed  "Interest"  Points, 
Showing  Point  Motion  Between  Images.  (Note: 
Circled  Points  Were  Considered  by  the  Proces 
to  Have  Bern  Mismatched,  and  Were  Excluded 
from  Further  Analysis). 
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ABSTRACT 

Over  the  past  two  years  Honeywell  has  developed 
a context  dependent  automatic  image  revognition 
system  for  analyzing  the  imagery  automatically 
and  detecting  tactical  as  well  as  strategic 
targets  in  the  image.  The  main  features  of 
the  image  recognition  system  are  sequential  frame 
processing,  symbolic  image  segmentation,  syn- 
tactic recognition,  recognition  of  multicom- 
ponent objects  and  conflict  removal.  In  this 
paper  we  describe  various  components  of  this 
context  dependent  automatic  image  recognition 
system  and  information  flow  between  these  com- 
ponents. 


INTRODUCTION 

A general  block  diagram  of  the  context  dependent 
system  is  shown  in  Figure  1.  The  image  is  first 
segmented  by  two  complementary  segmentation 
schemes.  Next,  man  made  object  (MMO)  is  detected 
in  the  segmented  image  by  a statistical  technique. 
The  output  of  the  MMO  detector  is  processed  by 
secondary  screening  target  detector  which  further 
reduces  false  alarms  based  upon  true  size,  tem- 
perature, etc.,  of  the  targets  on  the  ground 
plane.  Sequential  frame  analysis  is  used  to  im- 
prove the  performance  of  the  target  detector. 

A syntactic  recognition  scheme  uses  knowledge 
of  the  structural  description  of  the  targets  in 
recognizing  targets  that  are  large  enough  to 
show  structural  detail.  For  images  that  are  too 
small  to  show  any  structural  detail  a statistical 
recognition  scheme  is  used.  Sequential  frame 
analysis  is  employed  to  take  advantage  of  frame 
to  frame  consistency  in  the  imagery  to  improve 
the  overall  performance  of  the  system.  Extended 
objects  such  as  river,  road,  bridge,  highway, 
etc.,  are  classified  by  the  background  classifier. 
The  outputs  of  the  background  classifier,  the 
small  image  statistical  classifier  and  the  large 
image  syntactic  classifier  are  combined  by  a 
configuration  analysis  scheme  to  recognize  mul- 
tiple component  structures  such  as  SAM  sites, 
vehicle  convoys,  airport  and  to  remove  conflicts. 
An  example  of  conflict  removal  is  using  the  fact 
that  wheeled  vehicles  are  not  found  in  the 
middle  of  lakes  or  rivers.  The  output  of  the 
conflict  removal  function  Is  recognized  targets 
that  have  tactical  importance  or  are  Important 
based  on  mission  analysis.  In  the  following 


sections  we  describe  the  individual  components 
of  the  system  in  detail. 


SEGMENTATION 

In  picture  recognition  problems,  the  segmentation 
task  can  be  divided  into  two  classes:  texture 
based  and  non-texture  based  segmentations.  We 
use  prototype  similarity  transformation  tech- 
nique! 'I  for  non-texture  based  segmentation.  It 
is  a method  for  transforming  an  imag^  into  a 
set  of  symbols,  each  of  which  represents  the 
relationship  of  a local  region  to  other  parts  of 
the  image.  A general  block  diagram  of  prototype 
similarity  transformation  is  shown  in  Figure  2. 
Generating  prototypes  is  equivalent  to  finding 
a maximalset  of  mutually  dissimilar  cells.  A 
cell  is  a pixel  or  a collection  of  pixels,  depend- 
ing upon  the  required  resolution  in  the  segmented 
scene.  The  generated  set  of  prototypes  is  used 
to  label  each  cell  in  the  image.  A priori  infor- 
mation about  the  scene  is  used  to  guide  an 
inference  process  to  give  meaning  to  each  cell  in 
the  symbolic  image. 

Segmentation  of  individual  components  of  a target 
can  also  be  done  by  using  the  prototype  similarity 
transformation  technique.  This  is  done  by 
iterative  use  of  the  technique  at  progressively 
higher  cell  resolution  as  shown  in  Figure  3. 

Texture-based  segmentation  scheme  uses  a two- 
dimensional  difference  histogram.  The  difference 
histogram  is  similar  to  the  co-occurrence  matrix 
of  Harralick^l  The  histogram  gives  a measure 
of  the  joint  probability  density  function  of 
gray  level  and  gray  level  difference  pairs. 

Points  in  a window  around  each  mode  or  local  maxi- 
mum in  this  histogram  correspond,  ideally  speak- 
ing, to  a region  with  a particular  texture  pattern 
in  the  input  image.  Given  a mode  location  in 
the  two-dimensional  histogram  M (I,  al)  the 
window  around  it  is  selected  by  locating  nearest 
valleys  or  local  minima  around  the  peak.  The 
output  of  the  region  extraction  operation,  which 
extracts  regions  in  the  image  corresponding  to 
various  modes  in  the  histogram,  is  a set  of 
scattered  pixels  corresponding  to  each  region.  A 
noise  cleaning  method,  called  "expand  and  shrink", 
is  used  for  cleaning  the  segmented  Images  based 
on  connectivity. 
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SEQUENTIAL  FRAME  ANALYSIS  IN  IMAGE  SEGMENTATION 

In  both  the  segmentation  techniques  results  of 
the  segmentation  of  previous  frame  are  used  as  the 
starting  points  of  segmentation  in  the  present 
frame.  In  prototype  similarity  transformation 
the  Initial  choice  of  prototypes  is  the  same 
as  the  r'^ototypes  generated  in  the  previous 
frame.  The  advantage  of  this  is  that  the  perfor- 
mance of  the  segmentation  technique  approaches 
the  asymptotic  value  as  time  proceeds.  In  the 
difference  histogram  technique  the  average  inten- 
sity I and  intensity  difference  a I in  each  of 
the  regions  segmented  in  the  previous  frame  are 
used  as  the  center  points  of  the  difference 
histogram  windows.  This  has  the  added  advantage 
of  reduced  computational  requirement  because 
of  reduced  search  time  in  the  difference  histo- 
gram. 


TARGET  DETECTION 

The  output  of  non-texture  segmentation  is  used 
to  detect  and  recognize  targets  such  as  tanks, 
trucks,  and  APCs.  A preliminary  screening  of 
non  man-made  objects  (MMO)  is  first  performed 
on  the  segmented  image  by  a linear  classifier. 
Table  1 shows  features  used  in  classifying  the 
segmented  objects  into  MMO  vs  non-MMO. 


Table  1 : Features  for  MMO 
Detection 


Number 

Feature 

1 

Number  of  Scan  Lines 

2 

Area 

3 

Edge  Straightness 

4 

Max  (Width/Length,  Length/Width) 

5 

Edge  Discontinuity 

6 

Number  of  Edges 

7 

Number  of  Brights 

8 

Position  in  the  Initial  Scan 

9 

Position  in  the  Final  Scan 

10 

Final  Scan  Line 

The  current  performance  of  the  classifier  is 
over  90%  detection  at  4%  false  alarm. 

Secondary  Screening — The  detected  objects  are 
further  screened  based  on  the  true  size, 
temperature,  or  other  physical  properties. 
Classification  for  secondary  screening  is  per- 
formed using  image  features,  sensor  parameters, 
and  physical  dimensions  of  all  anticipated  tar- 
gets. The  sensor  parameters  needed  are  the 
angular  subtense  of  the  Field  of  View  (FOV), 

(Ct  n).  pixel  dimensions  of  the  FOV  In  the  image 
plane,  (M,  N),  the  angle  of  depression  of  the 
LOS,  a,  and  the  altitude  of  the  sensor  location 
or  carrying  aircraft,  H.  Figure  4 shows  the 
performance  of  the  target  detector  for  AAD-5  and 
FLIR  sensors  under  various  simulation  conditions. 


SEQUENTIAL  FRAME  ANALYSIS  FOR  TARGET  DETECTION 

System  noise  in  an  image  recognition  system  affects 
the  performcr.ee  of  the  system  in  two  ways.  Firstly, 
the  target  may  fail  to  meet  the  segmentation  cri- 
teria of  the  system,  resulting  in  a missed  target. 
Secondly,  the  feature  values  of  the  segmented 
objects  may  be  erroneous,  resulting  in  missed 
targets  as  well  as  false  alarms.  Improved  false 
alarm  and  detection  is  achieved  by  accunulating 
information  regarding  the  locations  and  the  feature 
values  of  the  objects  from  frame  to  frame. 

In  the  sequential  frame  analysis  we  first  determine 
an  interframe  sequence  of  extracted  objects  con- 
taining a given  candidate  target  in  the  present 
frame.  We  then  determine  if  the  classifier  result 
on  the  candidate  target  in  the  present  frame  is 
consistent  in  certain  manner  with  the  classifier 
results  on  other  objects  from  the  past  frames 
in  the  sequence.  An  inconsistent  classifier  result 
is  modified  in  some  prespecified  manner  that 
yields  better  classification  result.  This  method 
of  "smoothing"  the  classifier  result  consists  of 
three  distinct  steps,  frame  alignment,  interframe 
object  matching,  and  decision  smoothing. 

The  frame  alignment  technique  estimates  the  relative 
translation,  rotation,  and  scale  change  between 
two  successive  frames.  To  estimate  this  frame-to 
frame  change,  segmented  image  frames  and  an 
associated  feature  vector  for  each  segmented  object 
in  the  frame  is  used.  The  two  frames  are  then 
aligned  with  each  other  by  performing  appropriate 
transformation.  A symbolic  matching  of  segmented 
objects  in  the  two  frames  is  then  performed  to 
determine  the  correspondence  between  objects  in 
the  successive  frames.  The  classifier  decision 
made  on  a candidate  target  in  the  present  frame 
is  modified  based  on  the  decisions  made  the  same 
object  in  the  immediate  past  frames  using  maximum 
likelihood  estimate. 


TARGET  RECOGNITION 

At  short  ranges,  when  the  target  images  are  large 
enough  to  show  detailed  structure,  linguistic 
recognition  techniques  are  used  to  classify  the 
detected  targets  into  one  of  three  target  types: 
tank,  truck,  and  APC.  When  the  target  image  is 
too  small  to  show  any  structural  detail,  a knn 
classifier  is  used  to  classify  the  targets.  The 
features  used  for  statistical  classification  of 
small  image  targets  are  the  detection  features 
(Table  1)  and  intensity  and  boundary  moments. 

As  It  turns  out,  the  number  of  features  required 
for  statistical  pattern  recognition  is  often 
very  large,  which  makes  the  idea  of  describing 
complex  patterns  In  terms  of  a (hierarchical) 
composition  of  simpler  subpatterns  very  attractive. 
Also,  the  number  of  possible  descriptions  is  very 
large  in  the  case  of  tactical  targets  from 
relatively  close  range.  In  such  a case  it  is 
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impractical  to  regard  each  description  as  defining 
a class.  Consequently,  the  requirement  of  recog- 
nition is  better  satisfied  by  a syntactic  descrip- 
tion of  each  class  rather  than  by  its  classifi- 
cation. 

The  assumption  in  this  syntactic  approach  to 
tactical  target  recognition  are: 

• Images  of  tactical  targets  are  large 
enough  to  show  structure. 

• It  is  easier  to  recognize  target 
components  than  the  target. 

The  first  assumption  deals  with  the  sensor-target 
range.  If  the  range  is  too  large  to  show  any 
details  inside  the  target,  one  would  have  to 
resort  to  statistical  recognition  techniques. 

But  as  the  sensor-target  range  decreases  and 
the  target  structure  becomes  discernable,  syn- 
tactic recognition  schemes  become  feasible. 

From  our  experience,  if  the  target  area  is  of 
the  order  of  one-half  to  one  percent  of  sensor 
FOV,  syntactic  recognition  schemes  are  feasible. 
This  translates  to  about  a ten  centimeter  pixel 
resolution. 

The  second  assumption  deals  with  the  relative 
ease  of  recognizing  target  and  its  components. 

If  it  is  easier  to  recognize  a target  than  its 
components,  as  would  be  the  case  when  target 
image  is  only  a few  pixels,  one  would  not  employ 
syntactic  recognition  schemes.  But  In  low 
quality  Images  where  the  recognition  based  on 
target  outline  is  not  very  reliable,  a syntactic 
scheme  can  be  successfully  used  to  recognize 
targets  provided  the  assumption  on  target  image 
size  holds.  Even  for  good  quality  Images, 
target  orientations  will  result  in  different 
target  outlines.  Consequently,  one  will  need 
several  classifiers  for  each  type  of  target.  In 
principle,  one  set  of  syntactic  rules  can  be  gen- 
erated to  recognize  the  target  from  all  aspect 
angles.  Syntactic  recognition  schemes  can  also 
be  successfully  used  for  partially  occluded 
targets  where  conceivably  statistical  recognition 
schemes  would  fall. 

We  have  developed  and  applied  a syntactic  classif- 
ier 131  to  a training  set  of  27  FUR  Images 
containing  tanks  and  trucks.  We  obtained  100% 
recognition  at  zero  false  alarm.  The  technique 
is  still  to  be  tested  on  a large  data  set. 

BACKGROUND  CLASSIFIER 

Recognition  of  extended  object  is  done  by  nearest 
neighbor  (NN)  rule  using  9 features:  length, 
average  width,  area,  height,  average  intensity, 
maximum  Intensity,  minimum  Intensity,  average 
contrast  and  peak  contrast.  We  are  currently 
developing  an  optimal  hierarchical  sub-grouping 
of  the  features  for  the  purpose  of  minimizing 
computational  complexity.  The  objective  Is  to 
retain  only  a subset  of  prototype  samples  for 
each  sub-group. 


CONFLICT  REMOVAL 

Conflict  removal  combines  object  information  and 
relational  context  Information  for  modifying 
classifier  decisions  that  are  inconsistent  with 
our  world  knowledge.  The  process  requires  model- 
ing and  representing  the  world  knowledge  regarding 
objects  in  the  scene  and  determining  an  optimal 
way,  called  search  strategy,  of  examining  the 
scene  using  the  knowledge  model.  The  method  can 
also  be  used  for  recognizing  scene  components 
containing  multiple  objects.  Examples  of  such 
objects  are  airports,  SAM  sites,  convoys,  and 
bunkers.  Various  methods  of  modeling  the  know- 
ledge and  using  the  model  to  recognize  mission 
oriented  scenes  exist  in  the  literature*^!  The 
methods  depend  on  the  particular  application  of 
the  system.  We  have  combined  appropriate  concepts 
from  various  systems  and  developed  a knowledge 
model  and  a search  strategy  for  military  tactical 
importance  in  ImagerylSI 

Conflict  removal  Is  performed  by  detecting  In- 
consistent configurations  in  the  scene.  An 
example  is  a tank  in  the  middle  of  a river. 

If  the  structural  relationship  between  two 
recognized  objects,  one  recognized  as  a tank 
and  another  recognized  by  the  background  classifier 
as  a rivar.  Is  such  that  the  tank  is  located  In  the 
middle  of  the  river  then  that  particular  con- 
figuration Is  flagged  as  Inconsistent  with  the 
world  knowledge  network  model.  In  such  cases  the 
target,  the  tank  In  our  example,  is  reclassified 
to  a "don't  know"  category.  This  conflict  or 
inconsistency  is  removed  by  sequential  frame 
analysis,  which  is  analogous  to  a human  operator 
taking  several  looks  at  the  scene  of  Interest 
when  he  is  not  confident  of  his  recognition 
result  for  the  given  scene. 


SYSTEM  SIMULATION 

Until  the  present  time  we  have  simulated,  optimiz- 
ed, and  tested  components  of  the  context  depend 
automatic  image  recognition  system  individually. 

We  have  Initiated  the  task  of  simulating  the 
entire  system  as  a whole  and  evaluating  the 
performance  of  the  system  while  characterizing 
the  Impact  of  the  performance  of  one  component 
on  other  components  In  the  system.  We  have 
identified  a test  data  base  for  the  purpose.  To 
help  speed  up  the  system  simulation  we  will  use 
the  I‘S  Model  70  video  data  processing  system 
recently  acquired  and  Interfaced  with  the  Level 
6/43  computing  system  of  the  Image  Processing 
Laboratory  of  the  System  and  Research  Center. 
Special  architecture  of  the  Model  70  system  will 
enable  us  to  simulate  parallel  processing  of 
certain  functions  such  as  prototype  similarity 
segmentation  and  texture  based  segmentation.  This 
will  help  us  characterize  the  need  for  digital 
parallel  processing  in  an  advanced  automatic 
Image  recognition  system. 
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Abstract  - This  paper  describes  the  work 
undertaken  during  the  past  six  months  by  Hughes 
Aircraft  Company  under  a subcontract  from  the 
University  of  Southern  California,  Image  Process- 
ing Institute,  for  the  DARPA  Image  Understanding 
program.  Two  principal  areas  are  discussed: 

(1)  the  work  on  a charge-coupled-devlce/metal- 
oxlde-semcionductor  teat  circuit  to  develop  five 
real-time  preprocessing  operators  and  (2)  the 
study  and  analysis  of  a real-time,  high-level 
processor  architecture  that  measures  texture.  Also, 
the  design  and  development  of  a real-time  pro- 
cessing facility  based  at  Hughes  Research  Labora- 
tories for  performance  test  and  evaluation  Is 
described. 


provided  directly  on  the  sensor  itself.  At  the 
higher  level,  where  many  operations  are  required 
on  each  data  point,  but  where  the  nianber  of  data 
elements  Is  reduced,  a digital  approach  providing 
an  increased  accuracy  and  dynamic  range  Is  most 
appropriate. 


Section  II  of  this  paper  describes  our  con- 
tinuing work  on  Test.  Chip  III  to  develop  high- 
speed preprocssing  functions.  Section  II 
describes  an  approach  to  higher  level  processing 
(such  as  texture  and  segmentation),  and  Section  IV 
discusses  the  work  we  have  undertaken  on  our  Image- 
processing  facility  to  enable  us  to  operate  our 
custom-built  IC8  In  real  time. 


I.  INTRODUCTION 

During  the  past  six  months,  we  continued  our 
work  to  develop  custom-designed  Integrated  circuits 
for  real-time  Implementation  of  Image-understanding 
algorithms.  The  work  iuis  centered  on  three  areas: 
the  detailed  design  and  layout  of  a third  test 
chip,  TCIII;  the  development  of  new  concepts  for 
more  advanced  (higher  level)  processing  operations 
(Including  a texture  chip);'-  and  the  design  and 
construction  of  the  necessary  circuits,  such  as 
clock  drivers,  to  operate  the  processors. 

In  the  previous  phase  of  this  programs,  we 
developed  concepts  and  test  circuits  for  "real- 
time"  (equivalent  to  television  data  rates)  pro- 
cessing of  "low-level,"  or  preprocessing, algorithms. 
Including  edge  detection,  unsharp  masking,  local 
averaging,  adaptive  stretch,  and  blnarlzatlon.  Our 
approach  Is  to  employ  fast  analog  preprocessors 
Integrated  at  or  close  to  the  sensor  Itself  and 
then  to  follow  this  by  progranmable  digital  pro- 
cessing using  highly  regular  LSI  or  VLSI  designs. 

In  the  preprocessing  stage,  which  includes  image- 
enhancement,  feature  analysis,  etc.,  where  a 
limited  number  of  operations  are  required  on  each 
pixel,  the  effective  data  rates  are  extremely  high 
(typically  in  excess  of  10°  operat Ions /sec ) . This 
exceeds  the  current  throughput  of  state-of-the-art 
hlgh-denslty  digital  ICs  (10'  operatlons/scc) . For 
this  processing,  a direct  analog  approach  that 
maintains  an  accuracy  equivalent  to  6 bits  can  be 
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II.  DESIGN  AND  FABRICATION  OF  TEST  CHIP  III 


We  have  Investigated  five  rfrrulr*  for  Inclu- 
sion In  our  current  test  chip.  These  are  a 3 x 3 
Laplaclan  operator,  a 7 x 7 kernel  (which  Is  cur- 
rently being  Implemented  as  an  edge  detector  but 
can  be  mask  programed  to  perform  other  operations, 
such  as  the  binary  checker  boards  or  unsharp  mask- 
ing), a 5 x 5 progranmable  filter  (which  we  Intend 
to  Integrate  with  a commercial  microprocessor),  a 
5x5  "cross-shaped"  median,  and  a large  bipolar 
convolutional  array  for  26  x 26  pixel  convolutions. 


For  each  of  these,  we  have  developed  circuit 
concepts  that  will  allow  the  data  to  be  processed 
at  real-time  data  rates.  Circuit  simulations  that 
evaluate  the  accuracy  and  speed  and  hence  the 
dynamic  range  have  been  completed  for  each  circuit. 
The  detailed  designs  and  layouts  of  these  operators 
have  now  been  completed , and  we  anticipate  having 
processed  parts  by  July,  which  should  allow  the 
preliminary  evaluation  to  be  started  before  the  end 
of  this  phase  of  the  contract  In  SeptesAcr. 

The  technology  used  to  Implement  the  algor- 
ithms Is  n-channel,  two-phase  metal-oxide 
semiconductor  (MOS)  and  charged-coupled  device 
(CCD).  The  full  chip  size  Is  approximately  225 
mils  x 225  alls,  and  conventional  photolithography 
Is  used,  resulting  In  a llnewldth  of  ^5  pa.  With 
this  resolution  and  by  using  surface-channel  tech- 
nology, a clock  rate  of  7.5  MHz  should  be  possible. 
A block  schaaatlc  and  a brief  description  of  each 
circuit  is  given  below. 
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A.  Laplaclan  Operator 


The  Laplaclan  operator  is  a bipolar  weighting 
scheme  A operating  on  a 3 x 3 array  of  picture  ele- 
ments, given  by 


sum  of  the  charge  or  pixel  magnitudes  at  each  clock 
sample  is  applied  to  an  "on-chip"  sample  and  hold. 
The  voltage  on  the  floating  gate  array,  sensed  by 
the  sample  and  hold  at  the  nth  clock  period,  T,  is 


I WnT) 


V (nT) 
o 


to  produce  the  convolution  output  A * where  £ is 
the  unprocessed  image  array.  It  is  used  for  crls- 
pening  and  edge  sharpening.  It  can  be  implemented 
directly  at  the  sensor  using  a two-dimensional  CCD 
array  consisting  of  a set  of  linear  transversal 
filters.  A schematic  of  the  system  is  shown  in 
Figure  1.  Each  filter  is  a two-phase,  n-channel 
device  with  18  gates.  The  added  latency  time  for 
this  device  is  equivalent  to  ^0.5  pixels 
(-0.1  psec).  This  is  in  addition  to  the  inherent 
delay  of  the  algorithm  of  approimxately  one  video 
line  Cv>63  psec). 

The  circuit  uses  the  floating  gate  technique 
to  sense  nine  adjacent  charge  packets  representing 
the  array  of  3 x 3 adjacent  pixels.  Three  adja- 
cent lines  of  charge,  representing  the  video  data, 
are  clocked  through  the  array,  and  the  weighted 


v«  vb  *1  *1  VDC  *l'  VDC  *1  *1 


where  C^  is  the  total  capacitance  of  the  array  and 
connecting  bus-line,  Q^(nT)  is  the  total  charge 
representing  each  picture  element  under  the  gates 
at  time  nT,  and  A^  is  the  effective  gate  area  at 
each  location.  For  correct  operation,  the  length 
of  each  gate  must  be  proportional  to  the  elements 
of  A.  Since  the  length  of  each  gate  must  be  a 
positive  value,  the  conventional  approach  would  be 
to  implement  A as 


2 4 2 
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and  to  connect  each  of  the  gates  to  either  a posi- 
tive or  negative  bus  line  of  a differential 
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amplifier.  In  practice,  the  differential  amplifier, 
which  Itself  can  be  Implemented  In  n-MOS  technology, 
becomes  a significant  portion  of  the  total  area  of 
the  chip,  typically  comparable  or  perhaps  even 
larger  than  the  CCD  array  Itself.  Further,  the 
differential  amplifiers  are  themselves  voltage- 
controlled  devices,  and  the  charge-to-voltage 
transitions  necessary  can  Introduce  linearities  and 
noise.  For  this  reason,  we  use  the  Hughes-patented 
dlsplacement-charge-subtractlon  (DCS)  technique, 3 
which  implements  A directly  as 


coherent  noise  sources.  These  circuits  have  been 
designed  and  simulated  to  run  at  a 7.5-MHz  data 
rate  with  accuracy  equivalent  to  6 bits. 


B.  7x7  Mask  Programmable  Kernel 


In  the  April  1968  Semi-Annual  Report,  several 
processing  algorithms  were  discussed  that  use  a 
7x7  array  as  a binary  checkerboard  weighting  for 
image  decomposition  or  as  a version  of  unsharp 
masking  or  deblurring.  Because  of  the  Interest  in 
this  array  size,  we  have  built  a mask-programmable 
array  that  can  be  used  to  form  a variety  of  opera- 
tors. The  basic  concept  consists  of  a 7 x 7 array 
of  CCD  stages  that  can  be  operated  from  seven 
parallel,  adjacent  video  lines.  The  basic  struc- 
ture is  shown  In  Figure  2.  With  this  basic  struc- 
ture, we  can  use  a mask  change  to  perform  a variety 
of  different  operations.  Basically,  only  those 
levels  that  determine  the  filter  weightings  and  the 
bus  line  interconnection  need  be  changed  to  provide 
each  of  the  operations  discussed  In  the  April  1978 
Semi-Annual  Report.  This  technique  provides  a very 
flexible  and  cost-effective  way  of  performing  a 
wide  veriety  of  7 x 7 algorithms. 


and  is  low  capacitance  and  eliminates  "common- 
mode" output.  This  technique  has  been  shown  to 
provide  up  to  90-dB  dynamic  range  and  68-dB  common 
mode  rejection. 


The  processed  outputs  from  the  array  are  fed 
directly  to  an  "on-chip"  sample  and  hold  circuit, 
which  eliminates  clock  feedthrough  and  rejects 


Initially,  we  have  designed  the  mask  to  per- 
form a 7 x 7 edge-detection  operation  with  radially 
symmetric  weights.  The  weights  used  are  given  by 
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Figure  2.  7x7  Mask  programmable  array 
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and  the  edge-detected  output  can  be  written  as 

si  - 11%!  + |P*Hyl  • <6> 

A detailed  view  of  one  linear  CCD  array  that  achieves 
this  is  shown  in  Figure  3.  Here  the  weightings 
are  arranged  to  be  inversely  proportional  to  their 


distance  from  the  center  of  the  array.  In  this 
way,  the  edge  value  is  concentrated  ("focused") 
towards  the  center  picture  value,  and  the  larger 
array  size  gives  greater  immunity  to  noise  in  the 
sensed  image. 

C.  5x5  Programmable  Array 

To  achieve  simultaneously  the  high-speed 
capability  and  the  added  flexibility  of  program- 
mable operations,  we  have  included  a data-program- 
mable  5x5  array  as  shown  in  Figure  A.  The  pro- 
grammable approach  should  allow  many  of  the  image- 
understanding  operations  of  interest  on  a 5 x 5 
array  to  be  performed  with  one  circuit.  The  con- 
cept is  shown  in  Figure  5.  It  has  been  designed 
to  accept  data  at  the  standard  7.5-MHz  video  rate 
and  to  enable  the  weighting  functions  to  be 
changed  at  the  frame  rate  of  30  Hz.  Since  each 
weighting  node  has  been  brought  out  directly  to  an 
external  pin  of  the  64-pin  package,  we  can  inde- 
pendently vary  each  element  of  the  25-point  con- 
volution with  an  accuracy  of  ^21.  Further,  since 
our  aim  is  to  drive  the  weights  from  a conmiercial 
microprocessor,  we  can  in  effect  cancel  out  many  of 
the  processing  Inaccuracies  and  nonlinearities  to 
obtain  optimum  performance  , enabling  us  to  study 
the  analog-digital  or  low-level/high-level 
interfaces. 

D.  5x5  "Plus-Shaped"  Median  Filtering 

Both  USC  IPI  and  Hughes  Research  Laboratories 
(HRL)  have  done  extensive  simulation  on  median  fil- 
tering. The  median  operator  is  an  obvious  candi- 
date for  preprocessing  and  can  be  very  useful  for 
both  rejecting  impulsive  noise  and  for  overcoming 
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Figure  3.  Mask  structure  programmed  for  7x7  edge  detection 


Figure  4.  Schematic  of  test  system  for  program- 
mable filter  incorporating  mirco-computer 
control. 


defects  in  the  imaging  system.  Both  HRL  and  IP1 
studies  show  that  a "plus-shaped"  array  with  9 
pixels  is  optimal  for  many  of  the  images  of  inter- 
est. Perhaps  the  most  direct  approach  to  a 
median  filter  is  to  perform  a sort  operation  and 
then  choose  the  fifth  element  in  the  stack  (for  a 5 
x 5 cross).  To  do  this,  n(n  - l)/2  - 36  compara- 
tors are  required.  The  conventional  approach  is  to 
form  the  ladder  network,  or  "bubble-sort"  array, 
shown  in  Figure  6.  Here  each  comparator  module 
(CM)  has  three  basic  states,  depending  on  the  rela- 
tive magnitude  of  the  two  inputs  "a"  and  "b."  In 
the  configuration  shown,  if  a > b or  a - b,  the  CM 


acts  simply  as  two  parallel, one-element  delays. 

For  b > a however,  it  acts  as  a cross-bar  switch 
and  reverses  the  two  outputs.  The  effect  after  36 
comparisons  is  to  provide  9 parallel  outputs 
ordered  by  Increasing  magnitude,  with  the  center 
output  being  the  median  value. 

This  structure  can  be  built  directly  in  MOS/ 
LSI  using  MOSFETs  to  provide  a result  equivalent 
to  7 bits  at  a rate  of  7.5  MHz.  It  can  also  be 
built  into  a modular  design,  which  will  allow  the 
array  size  to  be  Increased  by  adding  parallel 
chips.  Our  present  design  is  a direct  MOS  imple- 
mentation that  uses  external  delays  to  determine 
the  pixel  array  shape.  In  this  way,  the  operation 
can  be  performed  over  a variety  of  kernels. 

E.  26  x 26  Bl-Polar  Convolution  Filter 

We  have  included  on  this  chip  a processing 
algorithm  suggested  by  Professor  David  Marr  and 
his  colleagues  at  MIT.  From  a technology  stand- 
point, it  is  interesting  because  it  has  a signifi- 
cantly larger  kernel  size  than  the  arrays  built  to 
dnte  and  requires  high  accuracy.  The  full  kernel 
consists  of  a 26  x 26  element  array  with  a weight- 
ing ratio  of  approximately  1:150.  We  have  used 
the  circular  symmetry  of  the  array  to  build  only  a 
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Figure  5.  Schematic  of  5 x 5 progranmable  filter. 
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III.  CONCEPT  DEVELOPMENT  FOR  HIGHER-LEVEL 
PROCESSING 


We  have  spent  considerable  time  In  this  phase 
of  the  program  addressing  the  processing  require- 
ments for  the  high-level  operations.  As  a speci- 
fic operation  to  analyze,  we  have  chosen  the  tex- 
ture processor  of  Professor  W.K.  Pratt.  This  oper- 
ation (a  schematic  is  given  in  Figure  8)  basical- 
ly consists  of  a Sobel  or  Laplacian  operation  fol- 
lowed by  moment  operations  of  the  form 
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OPERATION  OF  CM  MODULE 


where  p^  is  the  analog  picture  intensity,  p is  the 
average  picture  intensity,  N is  the  number  of 
pixels  in  the  kernel,  and  n is  the  order  of  the 
moment  (typically  1 through  4). 


Figure  6.  Schematic  of  real-time  median  filter 
operator  for  5 pixels. 


Assuming  an  input  image  with  6-bit  dynamic 
range,  the  required  output  dynamic  range  could  under 
the  worst  condition  be  24  bits. 


26  x 13  element  kernel  and  intend  to  use  two  chips 
with  modified  input  structures  for  the  full  convo- 
lution. Further,  we  have  decided  to  build  the 
array  with  three  separate  outputs,  which  will  be 
helpful  in  the  test  and  evaluation  stage  and,  more 
significantly,  will  enable  us  to  scale,  or  normal- 
ize, the  weighting  to  achieve  higher  accuracy.  A 
schematic  of  the  circuit  is  shown  in  Figure  7. 


At^  first  sight,  it  might  appear  that  if 
(Pi  - p)n  were  calculated  directly,  (p^  - p)  would 
typically  be  a small  number  since  the  individual 
picture  elements  p^  will  most  probably  be  closely 
distributed  about  the  local  mean  p.  (This  will 
be  true  for  those  images  that  have  a low  variance.) 
However,  this  approach  causes  considerable  diffi- 
culty in  the  computation.  For  example,  for  each 


VSFTE 


VSCR  VDIF 


INDIE 


VIN2A 


sample 

HOLD 


(•  CENTFRI 


Figure  7.  26  x 13  pixel  array  Implemented  on  teat  chip  III 
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DIGITAL  CCD/MOS 
TEXTURE  CHIPS 


►SEGMENTATION 


Figure  8.  Concept  of  texture  processing  using  mo- 
ment calculations. 


In  this  way,  the  partial  products  can  be  calculated 
and  stored,  and  with  each  new  pixel  we  are  required 
only  to  subtract  the  contributions  of  the  oldest 
pixel  from  the  summation  and  to  add  the  newest. 

This  can  reduce  the  calculation  rate  by  more  than 
two  orders  of  magnitude  and  enable  real-time  or 
near  real-time  operation. 

IV.  DEVELOPMENT  OF  TEST  FACILITIES 


We  have  now  completed  this  design  and  begun 
fabrication  of  the  facilities  we  will  require  to 
test  and  provide  reference  evaluation  of  this  new 
chip.  A schematic  of  this  system  Is  given  in 
Figure  9. 


new  picture  element,  which  will  occur  approximately 
every  1£0  nsec,  we  will  be  required_to  calculate 
SXp^  - p)n  in  its  entirety  because  p will  also 
change  at  the  pixel  rate.  For  a 15  x 15  window, 
calculating  just  the  first  four  moments,  this  will 
result  in  a throughput  of  1.3  x 10lu  operations 
per  second,  the  majority  of  which  will  be  multipli- 
cations. This  Is  clearly  an  Inappropriate  approach 
since  a high-speed  multiply  might  take  50  nsec  or 
more  in  the  fastest  high-speed  emitter-coupled- 
logic  technology.  Several  hundred  channels,  requir- 
ing a very  large  amount  of  power  and  circuity,  would 
be  required  in  a parallel  architecture. 

Clearly  a preferable  approach  is  to  calculate 
the  non-centered  moments 
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Figure  9.  Schematic  of  test  facility  required  for  test  chip  III. 
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ABSTRACT 

This  paper  sunmarlzes  recent  work  performed 
for  Carnegie-Mellon  University  on  the  Investigation 
of  very  large  scale  integration  (VLSI)  implementa- 
tions for  Image  processing.  Discussions  of  basic 
memory  architectures  needed  for  two  dimensional 
operators,  reconstruction  of  block  truncation  coded 
Imagery  data,  and  the  Implementation  of  a programma- 
ble sum  of  products  operator  are  presented. 


INTRODUCTION 

The  concept  of  a VLSI  Implementation  of  a 
digital  Image  processor  based  on  multiple  real  tin.; 
arithmetic  logic  units  (ALUs)  and  buffer  memories 
was  presented  at  the  last  workshop.  The  Implementa- 
tion of  the  appropriate  buffer  memories  and  two  Image 
processing  functions , reconstruction  of  block  trun- 
cation coded  data  and  programmable  sum  of  products 
operator,  are  discussed  below. 

MEMORY  ARCHITECTURES 

In  Image  processing  systems  having  single  line 
video  inputs,  memory  Is  required  to  store  previous 
pixel  values  for  subsequent  processing  by  two-dimen- 
sional operators  such  as  edge  detectors , median  fil- 
ter, etc.  For  our  investigation  a JxKxB  bit  two- 
dimensional  Input  Image  andMxNxB  bit  two-dimensional 
operators  are  assumed. 

For  non-interlaced  single-line  video,  the  me- 
mory architecture  of  Figure  1 provides  the  necessary 
serlal-to-parallel conversion  needed  for  two-dimen- 
sional operators.  The  memory  consists  of  (M-l)  li- 
st age  shift  registers  which  format  the  data  as  M 
parallel  Input  lines  to  the  two-dimensional  proces- 
sors. The  first  line  of  the  data  need  not  be  buffered. 
The  buffer  memory  may  be  integrated  onto  the  same 
IC  as  the  processor  to  eliminate  the  need  for  M 
by  B Input  pins  on  the  processor  IC.  This  Integration 
would  reduce  the  number  of  processor  functions  that 
could  be  Integrated  on  a single  IC,  however  It  may 
be  advantageous  to  duplicate  the  buffer  memory  on 
several  different  processors  to  reduce  the  total 
pin  count  and  thus  the  size  of  the  system. 

For  Interlaced  slngle-llna  video,  a frame  buf- 
fer memory  Is  required  to  format  the  data  into  non- 
interlaced video.  Two  memory  architectures  for 
Interlaced  video  Inputs  are  shown  In  Figure  2.  In 
Figure  2(a)  a slngle-llne-ln  slngle-llne-out  buffer 


Figure  1.  Memory  Architecture  for  Non-Interlaced 
Single  Line  Video 


a.  Separate  »raa*  Buffer  and  Serial -to -Parallel  Converter 


Figure  2.  Memory  Architecture  for  Interlaced  Single 
Line  Video 


Inputs  the  video  Into  the  serlal-to-parallel  memory 
described  in  Figure  1.  A single  frame  buffer  memory 
may  be  used  to  store  each  video  frame  by  using  an 
alternating  non-interlaced/interlaced  addressing 
scheme  to  output  one  frame  of  reformatted  data  while 
reading  Interlaced  data  from  the  following  frame. 

In  the  memory  architecture  shown  In  Figure 
2(b),  both  the  frame  buffer  function  and  the  serlal- 
to-parallel  conversion  functions  may  be  accomplished 
with  a single  frame  store  memory.  This  memory  has 
a single  line  Input  and  an  M line  parallel  output. 
This  approach  requires  a larga  pln-out  (MxB). 


BLOCK  TRUNCATION  CODING 


A simplified  block  diagram  of  the  reconstruc- 
tion circuit  is  shown  in  Figure  3.  The  reconstruc- 
tion circuit  receives  the  binary  image  block,  counts 
the  number  of  l's(q)  and  uses  a ROM  look  up  proce- 
dure to  calculate  the  square  root  quantity.  A final 
adder/subtractor  completes  the  calculation  of  A and 
B.  The  major  portion  of  the  circuit  is  the  required 
ROM.  The  size  of  the  ROM  is  determined  byVand  q. 
Since  q can  be  any  number  from  0 to  N andVmay  be 
8 bits,  the  ROM  is  (256xN)  x 8 bits  assuming  an 
8 bit  answer  is  desired.  For  the  case  studied, 
N=16,  therefore  the  ROM  must  have  4096  words.  Furhter 
investigations  have  shown  the  largest  value  possible 
for  the  square  root  quantity  in  equations  (5)  and 
(6)  is  s/255xIT  =62,  therefore  only  a 4096x6  bit 
ROM  is  necessary. 


Block  truncation  coding*  techniques  can  be 
used  to  reduce  the  bandwidth  needed  to  transmit 
imagery  data.  The  sample  mean  and  variance  of  small 
blocks  of  an  image  are  used  to  statistically  recon- 
struct the  image  from  binarized  image  blocks.  The 
following  equations  define  the  sample  mean  and  vari- 
ance, respectively. 


The  binary  image  block  l£  chosen  such  that 
all  pixel  values  greater  than  X are  set  to  1 and 
all  others  are  set  to  0.  In  reconstruction  all 
0's  are  replaced  by 


Figure  3.  Implementation  of  Block  Truncation 
Reconstruction 


and  all  l's  are  replaced  by 


PROGRAMMABLE  SUM  OF  PRODUCT  OPERATOR 


Many  image  processing  algorithms  require 
atlons  of  the  form 


where  q la_the  number  of  pixel  values  greater  than 
the  mean , X. 


Substituting  Equation  (2)  into  Equations  (3) 
and  (4),  the  following  equations  result: 


where  the  a{'s  represent  a set  of  fixed  weighting 
coefficients  and  the  X;  's  represent  a set  or  a 
sequence  of  input  values. 


Equation  (8)  can  be  implemented  using  digi- 
tal multipliers  and  adders;  however,  the  size  and 
power  required  to  perform  the  multiplication  at  video 
data  rates  are  prohibitive  for  most  image  processing 
applications.  Much  investigation  has  been  performed 
recently  on  two  similar  techniques  for  the  realiza- 
tion of  Equation  (8)  with  no  digital  multipliers. 

3 ,*  ,5 ,6  distributed  arithmetic  techniques  imple- 
ments the  sliding  sum  of  products  (convolution)  of 
an  input  word  sequence  with  a set  of  weighting  co- 
efficients. The  ROM-accumulator  technique  implements 
a nonsliding  sus  of  products  of  an  input  word  set 
with  a set  of  weighting  coefficients.  Both  methods 
use  a table  look-up  procedure. 


A digital  implementation  of  the  block  trun- 
cation encoding  algorithm  for  4 by  4 pixel  blocks 
has  been  investigated  and  was  discussed  at  the  last 
workshop. 


The  operation  performed  by  the  distributed 
arithmetic  technique  Is  the  convolution  defined  by 


yn  “ ^ al*n-l  (9) 

l- 0 

where  Xn_£  Is  a B-blt  word  represented  by 
B-l 

*(»-«)/  (10) 

J-0  J J 

Substituting  Equation (10)  Into  Equation  (9)  yields 


B-l  L-l 

yn  - S Xw  alhn-i)i  2-> 
j-o  e-o 


The  penalty  for  blocking  the  aeaory  Is  the  addition 
of  Q-l  digital  adders.  Figures  shows  a Q-2  aeaory 
structure  block  diagram  where  L-4  words. 

The  frequency  of  table  look-up  can  be  reduced 
by  multiple-bit  addressing  of  the  aeaory,  l.e. , by 
using  M bits  from  each  word  to  address  the  aeaory. 
The  number  of  table  look-ups  is  thus  reduced  to 
Bx/M.  However,  the  aeaory  size  is  Increased  to  2ML 
words  with  each  word  Ba+log7L>+  M bits.  Figure  6 
shows  such  a structure  for  which  M-2.  The  last  two 
stages  of  each  delay  line  are  used  to  form  a2L-blt 
memory  address. 


Figure  4.  Block  Diagram  of  Implementation  of  Single- 
Memory  Storage  of  Partial  Products 


Since  the  values  a^  are  fixed,  the  2L  possible  values 
of  the  bracketed  term  of  Equation  (11)  may  be  calcu- 





lated  a priori  and  stored  in  a memory.  For  each  j 
of  the  outer  summation,  the  value  of  the  bracketed 
term  Is  recalled  from  the  memory  location  whose  ad- 
dress is  formed  by  the  L bit  binary  word 


The  word  stored  In  this  location  is  given  by 


Figures.  Block  Diagram  of  Two-Memory  Implementation 


These  values  recalled  from  memory  are  weighted  by 
the  factor  2-^  and  summed  over  j. 


Figure  4 shows  a block  dlagraa  that  laplements 
Equation  (9)  using  the  table  look-up  algorithm  of 
Equation  ( 1 1) . The  sequential  Bx~blt  Input  signal, 
X,  Is  loaded  blt-serlally  Into  L-cascaded,  Bx~blt 
delay  lines.  The  last  stage  of  each  delay  line  is 
used  to  fora  the  L-blt  address  for  the  meaory.  The 
shift  and  accumulate  function  performs  the  binary 
weighting  and  summation  over  j.  The  size  of  the 
aeaory  needed  for  this  Implementation  is  2L  words. 
If  the  weighting  coefficients,  a;  , are  of  B -blt 
accuracy,  the  word  size  of  the  aeaory  aust  be  B.+IogaL 
bits  to  prevent  overflow.  The  number,  and  thus  the 
frequency,  of  table  look-ups  for  this  lapleaentstlon 
Is  proportional  to  the  Input  word  length  (Bx).  The 
required  aeaory  can  be  reduced  by  partitioning  the 
word  sequence  Into  Q blocks.  This  requires  Q memories 
of  size  2^*' ^ words  with  each  wordBa«-log2(l./Q)  bite. 


Figure  6.  Block  Dlagraa  of  Multiple-Bit  Addressing 
lapleaentstlon 


1.1 


In  general 


Memory  Sire  * q2ML/QUordl 


(14) 


Memory  Word 


Sire  - B-+log2(L/Q)bita  M-l 

* (15) 

Ba+1og2(L/Q)+  M bits  M>1 


y 


B-l 


z 


Z 


az 


(20) 


Number  of  Digital  Adders  ■ Q-l  (16) 

Number  of  Table  Look-Ups  - Bx/M  (17) 

where:  Ba  - the  number  of  bits  in  each  weighting 

coefficient 

Bx  » the  number  of  bits  in  each  input  word 
L - the  number  of  words  used 
Q - the  number  of  blocks  used 


The  RAC  technique  is  a subset  of  the  distri- 
buted arithmetic  technique.  Convolution  of  an  input 
word  sequence  with  the  weighting  coefficients,  a, 

is  not  performed  in  the  RAC  technique.  Figure  8 
shows  an  implementation  of  Equation  (8)  using  the 
RAC  technique.  The  memory  address  is  obtained  from 
the  bit-serial  words,  X,  as  shown.  Equations  (14) 
through  (18)  are  still  valid  for  the  RAC  technique. 
Like  the  distributed  arithmetic  implementation,  the 
memory  size  and  number  of  table  look-ups  can  be 
reduced  using  memory  blocking  and  multiple-bit  ad- 
dressing, respectively. 


M • the  number  of  bits  from  each  word  used 
for  addressing. 

A compromise  among  memory  size , number  of  adders , and 
the  number  of  table  look-ups  can  be  made  to  ease 
implementation.  Figure  7 shows  a block  diagram 
implementation  for  Bx«6,  L-4,  Q-2,  and  M-2.  The 
contents  of  the  memories  is  a function  of  L,  M,  Q, 
and  the  a's.  In  general,  the  contents  of  location 
Z in  memory  q is  given  by 

U/Q-l  M-l 

C(q.Z)  - y?0L/Q»  i JjMi+m^  ZMi+m£{0*1}  (18) 

7-0  m-0 


Several  architectures  and  LSI  technologies  are 
being  investigated  as  candidates  for  implementing 
a programmable  sum  of  products  operator.  Many  archi- 
tectures have  been  deemed  impractical  because  of 
the  high  internal  operating  frequency  required.  In- 
vestigation of  other  architectures  which  maximize 
processing  time  for  the  on-chip  digital  circuitry 
is  continuing. 


Figure  7.  Block  Diagram  of  Two-Memory  With  2-Bit 
Address  Implementation 


Figure  8.  Implementation  of  Programmable  Multiply 
and  Sum  Using  ROM-Accumulator  Technique 


The  ROM-accumulator  (RAC)  technique  Implements  Equa- 
tion (8)  exactly,  l.e.,  without  convolution.  Repre- 
senting each  input  word,  , by 

B-l 

**■  Z **, iJ  **'M  <»> 

J-0  J J 


and  substituting  into  Equation  (8)  yields 


REAL  TIME  MEDIAN  OPERATOR  UNIT 

The  5x5  median  of  medians1  operator  discussed 
in  the  last  workshop  is  being  implemented  using  off- 
the-shelf  components.  The  breadboard  is  being  de- 
signed for  real  time  operation  using  a commerlcal 
TV  camera  as  the  sensor.  The  breadboard  will  allow 
evaluation  of  the  median  of  medians  operator  and 
provide  inputs  for  the  design  of  the  Integrated 
version.  A block  diagram  of  the  breadboard  is  show: 
in  Figure  9.  The  memory  needed  to  buffer  4 lines 
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of  video  will  be  partitioned  to  allow  multi-resol- 
ution operation.  The  one  line  memory  after  the  median 
operator  la  needed  to  refresh  the  display  In  low 
resolution  operation.  Operation  will  be  performed 
on  a single  field  of  video  to  eliminate  the  need  for  a 
frame  buffer  memory. 

CONCLUSIONS 

This  paper  discussed  the  basic  memory  archi- 
tectures needed  for  two  dimensional  operators , the 
reconstruction  of  block  truncation  coded  imagery 
data  and  the  Implementation  of  a programmable  sum 
of  products  operator.  The  design  of  a digital  median 
operator  breadboard  for  a 5x5  pixel  window  and  8 bit 
accurracy  was  described.  Investigation  of  digital 
Integrated  circuits  Implementation  of  an  IC  capable 
of  calculating  the  statistics  of  an  image,  i.e., 
the  mean  and  variance  or  their  analogs,  the  median 
and  Interquartile  range,  at  real  time  video  data  rates 
has  begun  and  the  Investigation  of  architectures 
for  the  programmable  sum  of  products  operator  la 
continuing. 
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ABSTRACT 

Under  contract  to  University  of  Maryland, 
Westinghouse  has  been  Implementing  algorithms  for 
the  image  understanding  process.  The  program  is 
sponsored  by  DARPA  and  monitored  by  the  Army's 
Night  Vision  and  Electro-Optical  Laboratory.  Our 
objective  is  the  examination  of  the  latest  bit- 
sliced  technology  and  the  design  cf  innovative 
architectures  that  are  highly  parallel,  high-speed, 
fault  tolerant,  and  require  both  a small  instruc- 
tion set  and  a small  volume.  A key  consideration 
is  the  relaxation  process,  which  has  been  under 
intensive  investigation  at  Maryland,  and  which 
offers  the  prospect  of  improved  decisionmaking, 
at  the  expense  of  computational  load. 

This  paper  describes  an  architecture  for  im- 
plementing relaxation  in  near  real  time  and  a 
hardware  family  of  arrays  to  facilitate  the  archi- 
tecture. 


INTRODUCTION 

We  first  examine  the  relaxation  algorithms, 
at  the  pixel  level,  as  described  by  University  of 
Maryland.  The  computational  structure  is  of  the 
form  of  a scalar  multiplied  by  a matrix  multiplied 
by  a vector.  Implementation  in  a Systolic*  Array 
architecture  is  described  and  hardware  counter- 
parts are  found  in  Universal  Arrays.**  Because  re- 
laxation represents  the  new  generation  of  higher 
order  algorithms,  Universal  Array  Implementation 
is  appropriate  to  produce  speeds  necessary  to  show 
feasibility. 


•"Systolic  Array"  is  a term  used  by  H.T.  Rung  to 
describe  a network  of  processors  which  rhythmical- 
ly compute  and  pass  data  through  the  system.  See 
Reference  3. 

••"Universal  Array"  ia  the  name  given  by  Westlng- 
houae  to  a functional  array.  See  Reference  4. 


RELAXATION  AT  PIXEL  LEVEL 

Relaxation  at  the  pixel  level,  as  developed 

1 2 

thus  far  by  the  University  of  Maryland  ’ , is  con- 
cerned, among  other  things,  with  curve  detection 
and  detection  of  light  and  dark  regions  in  an  im- 
age. This  algorithm  classifies  each  pixel  into 
one  of  several  classes  by  initially  assigning  to 
each  pixel  a probability  of  possible  class  member- 
ship based  on  local  properties.  The  relaxation 
process  chen  uses  information  about  label  (class) 
interactions  on  a local  level  to  improve  this 
prior  classification. 

For  the  light-dark  classification  problem, 
figure  1 shows  a flow  chart  of  the  necessary  com- 
putations which  are  now  described.  Suppose  AX' 
are  the  two  classes,  light  and  dark,  for  adjacent 
pixels,  i.e.,  (XX')  can  be  (light,  dark),  (light, 
light),  etc.  Then  an  initial  probability  estimate 
for  pixel  xy  being  in  class  X (assume  X » dark 
for  this  example)  is  Pxy(X)  * (glxy  “ 8lmin^glr‘ 

The  term  gl  is  the  gray  level  at  pixel  xy,  the 
xy 

term  gl  , is  the  minimum  gray  level  over  the  en- 
min 

tire  image,  and  glf  is  the  range  of  gray  levels 

over  the  image.  An  estimate  of  the  probability  of 

any  pixel  in  the  image  of  pixels  having  label  X is 

p(X)  » - Z p (X).  An  estimate  of  the  Joint  prob 
r n xy  xy 

ability  of  a pair  of  adjacent  pixels,  xy  and  x+I, 
y+j,  having  labels  X and  X'  is  p^XX')""  £ 

Pxy(X)  px+1  y+j (X ' ) . Computationally,  xy  is  any 
pixel  position  in  the  image;  x+i,  y+j  stands  for 
each  of  the  eight  neighbors  of  the  center  pixel 
x,y  in  a 3*3  window  as  shown  in  figure  2.  The  lo- 
cal information  about  class  interaction  is  being 
developed  here.  As  the  3*3  window  moves  over  the 
image,  eight  (8)  different  p^AA')  expressions 
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are  developed  and  cumulated,  one  for  each  of  the 
neighbor  positions.  Further,  there  are  four  (4) 
possible  (XX')  combinations,  so  there  are  actually 
32  p^j(XX')  expressions  developed.  Finally,  the 
compatibility  coefficients  r^CXX')  expressions  are 
developed.  The  compatibility  coefficients 
r ij  <XX  * ) • p^j(XX')Ap  (X)  p(X')  can  be  calculated 
from  these  expressions.  These  expressions 
PjjCXX'),  p (X'),  p(X),  and  r^CXX')  are  referred 
to  in  figure  1 as  properties  of  the  image  frame 
and  must  be  obtained  before  the  relaxation  itera- 
tions can  be  started,  which  are  now  described. 


The  rationale  for  the  various  expressions  in 

3 

the  relaxation  iteration  was  described  earlier. 

To  update  the  p (X)  expression  based  on  local 

properties,  eight  (8)  intermediate  expressions 

qjj*(X)  * p‘(X>  Pj (X')r ^ (XX ')  are  calculated 

for  each  neighbor  of  the  center  pixel  of  the  3x3 

window  location.  The  superscript  k refers  to  the 

iteration.  Further,  p^T^fX)  - q <t1(X) /E  qk+1 ,, , , , 
fj  fj  X'  ij  IX  ) 

k+1  1 k+1 

and  p (X)  • — r E p,,  (XX  are  computed  successive- 
i n j lj 

ly.  Figure  3 shows  each  of  the  expressions  expand- 

k+1 

ed  for  two  (2)  classes,  1 and  2.  Note  that  p^  (X) 
is  the  updated  version  of  p (X),  where  n'  - 8 

and  corresponds  to  each  of  the  eight  neighbors  of 

k+1 

a 3*3  neighborhood.  These  expressions  q (X), 
k+1,.,,  k+l„,  ....  .k+1  J 


(X'),  Pj,  (X)  and  Pj  (X)  are  referred  to 


in 


figure  1 as  the  iteration  cycle.  Consider  now  the 
hardware  implementation  of  image  frame  properties 
and  relaxation  expressions  in  reverse  order. 

implementation  with  systolic  array  architecture 

Hardware  implementation  can  be  examined  as  an 
array  processor  associated  with  the  mathematical 
computation  structure  or  associated  with  the  geo- 
metry of  a 3*3  window.  For  the  time  being  the 

k+1 

former  approach  is  used.  The  expressions  q (X) 
k+1 

and  q^j  (X')  may  be  expressed  as  matrix  multiplica- 
tions as  shown  in  figure  4.  But  the  complete  ex- 
pression, shown  in  figure  5,  where  the 


term  is  really  a pair  of  scalars  in  which 


the 


p*(X)  term  is  a scalar  for  the  first  row  and 


Pj(X')  is  a scalar  for  the  second  row.  So  the 
first  part  of  the  relaxation  iteration  structure 
has  been  reduced  to  the  case  of  a scalar  multiplied 
by  a matrix  multiplied  by  a vector.  An  appropri- 
ate processor  array  architecture  for  this  form  is 

3 

the  Systolic  Array. 

Consider  a linear  array  of  processors,  each 
capable  of  communicating  with  the  other  and  each  of 
which  has  three  input  ports  and  two  output  ports. 
Further,  each  is  capable  of  a multiply,  an  add, 
and  holding  three  pieces  of  data  in  internal  re- 
gisters as  shewn  in  figure  6 . 

k k 

The  Pj(l),  Pj(2)  terms  enters  from  the  left 
and  move  to  the  right  through  the  array;  the 
q^Xl.q^X’)  terms  move  from  right  to  left 
through  the  array  and  the  matrix  r^(XX')  terms 
move  from  the  ton  to  the  bottom  of  the  array  and 
are  staggered  in  time  with  regard  with  their  entry. 
Then  for  a series  of  clock  cycles,  the  data  can 
be  followed  through  the  array  as  shown  in  figure  7. 

k 

At  clock  cycle  1,  p (1)  enters  from  the  left  and 
k+1  J 

q^j  (X)  enters  from  the  right.  At  clock  cycle  2, 
rll(X,l)  enters  the  middle  processor  along  with 
k k+ 1 k 

p j < 1>  and  qtj  (X);  p^ (1)  »r  (X,l)  is  formed  and 

placed  in  At  dock  cycle  3,  r^(X,2) 

k k+.l 

enters  the  left  processor  as  does  p^  (2)  and  q^  "(X)  . 

k lr+1 

Here  p^  (2)  *r^ ^ (X , 2)  is  formed,  added  to  q^  (X) 
resulting  in  q^tt)  • rf j (X,  1) -p^(l)  + r^  (X , 2)  -p^(2)  . 

Similarly,  in  the  right  processor,  r (X ' , 1)  *p^  (1) 
k+1 

is  formed  and  placed  in  q^  (X').  At  the  next 
k+1 

clock  cycle,  q^  (X)  exits  frjo  the  array  and  the 
k+1 

two  terms  of  q^  (X’)  are  formed  in  the  center 
processor.  Clock  cycle  5,  shows  the  matrix  com- 
ponent r^jCX'^)  completely  clear  of  the  array 
k+1  k 

with  q (X)  and  p (1)  also  out  of  the  array. 

J It+X 

Since  the  last  step  in  forming  q , (X)  is  a scalar 
k ” 

multiplication  by  p^(X),  another  processor,  with 

the  same  internal  structure,  added  to  the  array 

can  accomplish  this,  as  shown  in  figure  8.  The 

k+1 

array  structure  can  be  used  to  compute  W) 

since  it  is  also  of  the  form  of  a scalar 
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multiplied  by  a matrix  multiplied  by  a vector. 

Referring  to  figure  1,  and  the  expressions 
for  and  P^(*)»  If  is  noted  that  the  Systo- 

lic Array  shown  In  figure  7 must  be  repeated  for 
each  of  the  eight  neighbors  of  the  center  pixel 
yielding  32  processors.  Including  scalar  multipli- 
cation. Consider  now  the  implementation  of  Systo- 
lic Arrays. 

IMPLBIENTATION  WITH  UNIVERSAL  ARRAYS 

To  avoid  full  frame  storage  it  is  necessary 
to  complete  the  relaxation  Iterations  as  close  to 
the  frame  used  to  compile  the  Image  statistics  as 
possible.  Secondly,  as  the  classes  expand,  the  re- 
quired computations  grow  combinatorlally.  An  arbi- 
trary hardware  goal  at  this  point  in  the  develop- 
ment Is  to  complete  relaxation  in  as  close  to  real 
time  as  possible.  With  reference  to  the  Systolic 

k+1  k+1 

Array  and  the  computation  of  the  (1),  q^  (A‘) 
terms,  this  means  completing  three  or  four  multi- 
plications, three  adds,  and  four  shifts  in  approxi- 
mately 100  nanoseconds  for  a typical  500x600  pixel 
frame  size.  The  clock  cycle  governing  the  Systolic 
Array  is  then  of  the  order  of  10  nanoseconds.  As 
the  speed  requirement  becomes  clearer  and  the  small 
volume  constraint  is  unchanged,  the  technology  be- 
ing developed  for  the  next  generation  airborne  sig- 
nal processors  becomes  more  significant  and  the 
Universal  Arrays  are  now  described. 

4 

Universal  Arrays  or  functional  arrays  are  not 
to  be  confused  with  gate  arrays,  of  which  the  field 
programmable  logic  array  (FPLA)  Is  an  example,  or 
the  gate  arrays  currently  used  in  main  frame  com- 
puters. The  Universal  Array  consists  of  various 
transistor  and  resistor  parts  and  Is  divided  Into 
functional  cells  rather  than  individual  gates.  The 
Westinghouse  ECL  Universal  Array  Is  a combination 
of  diffusions  and  Ion  Implantations  which  define  a 
fixed  set  of  transistors  and  resistors  arranged  In 
three  types  of  cell  groupings:  Internal  logic  cell. 
Input/output  cell,  and  input  cell.  Each  type  of 
cell  Is  customized  by  configuring  the  two  level 
metal  Interconnect  pattern. 


The  array  is  composed  of  48  Internal  cells  in 
a 6x8  matrix,  24  input/output  cells,  and  16  Input 
cells.  The  chip  size  is  228  mils  by  240  mils, 
with  power  dissipation  of  2.5  watts,  and  propaga- 
tion delays  of  less  than  6 nanoseconds.  The  chip 
is  shown  in  figure  8,  and  figure  9 shows  the  in- 
ternal logic  cell.  The  array  can  be  customized 
into  a number  of  so-called  "personalization.''  Un- 
der an  Air  Force  contract,  a number  of  personali- 
zations will  be  produced  and  one  is  In  probe  test 
now. 

This  personalization  is  of  great  interest  to 
us  in  implementing  Systolic  Arrays  for  relaxation 
because  it  is  a 4x4  self-contained,  multiply  chip 
with  an  8-bit  product  obtained  In  12  nanoseconds. 
This  is  of  the  order  of  time  needed  to  do  relaxa- 
tion in  real  time  and  32  of  these  chips  would  fit 
in  an  area  approximately  1-1/2  inches  by  1-1/2 
inches. 

In  future  work,  we  shall  be  applying  the 
Westinghouse  Universal  Array  to  relaxation  and  oth- 
er University  of  Maryland  algorithms  in  an  attempt 
to  approach  real  time  processing  capability  and 
still  stay  in  a small  volume, 
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ABSTRACT 

The  SPARC  program  will  result  in  a breadboard 
model  of  a high  performance  processor  for  image 
processing  applications.  This  research  project  in 
computer  architecture  is  being  jointly  performed 
by  Control  Data  Corporation  and  Carnegie-Mellon 
University.  The  project  is  currently  in  the  final 
design  phase,  with  fabrication  beginning.  Results 
of  trial  processor  coding  are  encouraging. 


INTRODUCTION 

The  SPARC  program  is  a research  effort  in 
computer  architecture.  Begun  by  ARPA  in  1977,  the 
program  involves  the  design  and  fabrication  of  a 
breadboard  model  of  a high-performance  digital 
processor  for  use  in  image  processing  research. 
SPARC  is  a flexible,  highly  parallel  computer, 
which  may  be  used  singly  to  offload  computational 
tasks  from  a host  general-purpose  machine.  The 
design  also  includes  features  which  facilitate 
communication  between  multiple  SPARC  processors  in 
array  configurations  for  the  large  number  of  appli- 
cations where  processing  power  exceeding  the 
capability  of  a single  machine  is  required.  The 
work  is  being  performed  as  a joint  effort  between 
Control  Data  Corporation  and  Carnegie-Mellon 
University,  with  CDC  responsible  for  hardware 
development  and  basic  system  software,  while  CMU 
is  concerned  primarily  with  user  system  software 
aids.  CDC  is  concurrently  supporting  the  con- 
struction of  a second  processor,  and  more  exten- 
sive software  development. 

The  goal  of  the  present  phase  of  the  program 
is  to  have  SPARC  processors  installed  and  operating 
at  both  CDC  and  CMU  in  the  fall  of  1979.  At  the 
present  time,  the  hardware  design  is  in  its  final 
stages.  The  electronic  components  are  on  order. 
Circuit  board  layout  is  in  progress.  Cabinetry  is 
being  assembled.  Software  work  is  in  progress, 
with  the  main  efforts  being  concentrated  on  the 
development  of  a SPARC  cross-assembler,  a register- 
level  simulator,  a basic  operating  system,  and 
diagnostic  programs.  Trial  coding  of  several 
applications  algorithms  has  been  performed.  Work 
also  continues  on  multiple  processor  array 
architecture  and  facilities. 


SPARC  ARCHITECTURE 

A brief  review  of  the  SPARC  processor  archi- 
tecture will  facilitate  understanding  of  the 
application  example  to  be  discussed  later. 

The  basic  processor,  shown  in  block  diagram 
form  in  Figure  1,  consists  of  a number  of  hardware 
blocks,  called  functional  units,  each  of  which  is 
designed  to  perform  a certain  set  of  operations  on 
input  data.  The  initial  SPARC  machine  will  contain 
six  different  unit  types,  plus  a control  unit  to 
supervise  and  coordinate  the  operation  of  the 
processor.  These  units  perform  such  operations  as 
integer  addition,  multiplication,  shifting,  bit-by- 
bit  logical  functions,  data  storage,  and  input/ 
output.  The  units  are  interconnected  by  a general- 
ized crosspoint  switching  network,  from  which  all 
units  receive  input  operands  and  to  which  results 
are  delivered.  This  crossbar  switch,  which  is  re- 
configurable  at  a machine  cycle  rate,  provides  an 
internal  data  transfer  capability  of  approximately 
I x I0’0  bits  per  second. 

Both  the  control  and  data  interfaces  between 
the  functional  units  and  the  remainder  of  the 
processor  are  completely  generalized,  thus  allowing 
the  addition  or  substitution  of  different  units  as 
required  to  perform  various  specialized  tasks. 

Units  known  as  ring  ports,  designed  for  inter- 
processor communications  in  multiprocessor  ring 
arrays,  and  memory  access  units,  which  provide  links 
to  high-capacity  system  memories,  are  currently  in 
design.  Other  units  proposed  for  implementation 
include  an  FFT  unit,  which  would  implement  the 
butterfly  type  operation  of  the  transform,  and  a 
floating  point  unit,  which  would  provide  hardware 
floating  point  arithmetic  capability. 


APPLICATION  CODING  RESULTS 

The  kernels  of  several  algorithms  from  image 
processing  application  have  been  coded  for  a SPARC 
type  processor  in  order  to  obtain  estimates  of 
processor  performance.  These  include  a linear 
interpolation  process,  called  Warp  Interpolation, 
in  which  a value  is  calculated  for  a hypothetical 
image  pixel  utilizing  the  magnitude  and  location  of 
four  adjacent  pixels,  a Dot  Product  algorithm. 
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which  forms  sums  of  various  cross  products  between 
elements  of  two  images  in  order  to  produce 
measures  of  correlation,  a photonormalization 
process,  and  an  FFT  butterly  using  32-bit  complex 
arithmetic.  These  algorithms  were  chosen,  in 
part,  because  code  for  them  exists  for  the  Flexible 
Processor,  a Current  CDC  machine  which  has  been 
used  as  the  basis  of  several  image  processing 
systems,  thus  enabling  a direct  performance  com- 
parison to  be  made. 

Table  1 summarizes  the  performance  comparisons 
between  the  two  machines.  A SPARC  type  machine 
with  array  processing  features,  such  as  ring  ports 
and  memory  access  units,  was  used,  in  order  to 
obtain  better  correlation  with  the  FP  code.  The 
results  are  encouraging,  showing  performance  in- 
crease factors  ranging  between  17  and  59,  coupled 
with  a slight  decrease  in  average  coding  time. 


ARRAY  PROCESSING  ENHANCEMENTS 

Mention  has  been  made  in  this  report  of 
functional  units  under  development  at  COC  which 
are  intended  to  provide  the  communications  facili- 
ties necessary  to  use  multiple  SPARC  processors 
in  array  configurations,  for  applications  requir- 
ing compute  power  measured  in  billions  of  opera- 
tions per  second.  One  such  unit  is  the  Ring  Port, 
which  serves  as  an  interface  between  the  processor 
and  a ring  communications  system,  which  has  been 
described  in  reference  I. 

A second  type  of  unit  is  intended  to  provide 
high  bandwidth  access  for  processors  to  megabyte 
size  common  system  memories.  This  unit  will 
feature  800  M-bit  bandwidth,  maximum  addressing 
capability  of  2®®  bytes,  and  multiple  modes  of 
operation,  including  stream  modes,  in  which  fetch 
operations  and  address  updating  are  controlled  by 
the  memory  access  unit  hardware,  and  performed  at 
a rate  sufficient  to  keep  the  processor  supplied 
with  data.  These  units  not  only  provide  processor 
access  to  large  amounts  of  memory,  but  serve  as 
an  additional  communication  device  for  multiple 
processors  sharing  common  external  storage. 


CONCLUSION 

The  SPARC  program,  jointly  performed  by  CDC 
and  CMU,  is  progressing  toward  the  goal  of  develop- 
ing a high-performance,  modular,  digital  processor 
architecture  with  application  to  image  processing 
problems  requiring  computational  capability  in' 
the  10®  - 1010  operations  per  second  range.  The 
architecture  employed  enables  systems  of  one  or 
many  processors  to  be  easily  configured  (and 
reconfigured)  to  meet  various  proceseing  require- 
ments. Facilities  are  provided  to  accommodate 
system  components,  such  as  memory  devices,  which 
are  required  to  support  the  computing  power  of 
multiprocessor  arrays. 

The  authors  wish  to  acknowledge  the  contribu- 
tions of  many  engineering,  software,  and  manage- 
ment personnel  at  CDC,  along  with  the  significant 
contributions  of  the  staff  at  CMU.  The  combined 
efforts  of  all  concerned  continue  to  be  of  great 
benefit  to  the  project. 
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STATUS 

The  electronic  parts  for  the  prototype  SPARC 
computers  are  on  order.  Delivery  is  scheduled  for 
early  summer.  Qualification  lots  of  the  three  new 
LSI  arrays  designed  for  SPARC  are  in  fabrication. 
Final  design  clean-up  and  simulation  is  in 
progress,  and  circuit  board  layout  has  begun.  The 
cabinetry,  including  power  supplies  and  refrigera- 
tion hardware,  is  being  assembled. 

In  the  software  area,  overall  objectives  for 
the  initial  versions  of  simulator,  cross-assembler, 
diagnostics  and  operating  system  have  been 
determined.  Coding  effort  is  currently  taking 
place  in  these  areas.  Work  has  been  initiated  at 
CDC  toward  the  definition  and/or  selection  of  a 
potential  high-level  language  for  the  processor. 
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TABLE  I.  SOME  SPARC* /AFP  COMPARISONS 


ALGORITHM 

Warp  Dot  Photo-  FFT 

Interpolation  Product  Normalize  Butterfly 


NUMBER  OF  MICROINSTRUCTIONS 

FP 

IN  KERNEL  LONGEST  PATH 

SPARC* 

EXECUTION  TIME  FOR 

FP 

LONGEST  PATH  (in  micro- 
seconds) 

SPARC* 

TOTAL  NUMBER  OF 

FP 

MICROINSTRUCTIONS 

SPARC* 

MAN-WEEKS  TO  CODE 

FP 

KERNEL  LONGEST  PATH 

SPARC* 

7.50  6.375 

0.34  0.36 


A SPARC  type  machine  with  array  processing  enhancements 
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